ebook img

Alex Solomon - Nagios PDF

59 Pages·2012·7.93 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Alex Solomon - Nagios

MANAGING YOUR HEROES The People Aspect of Monitoring (a.k.a. Dealing with Outages and Failures) Alex Solomon [email protected] WHO AM I? Alex Solomon • Founder / CEO of PagerDuty • Intersect Inc. • Amazon.com 2 DEFINITIONS 3 Service Level Agreement (SLA) Mean Time To Resolution (MTTR) Mean Time To Response Mean Time Between Failures (MTBF) 4 OUTAGES 5 Can we prevent them? 6 PREVENTING OUTAGES Single Points of Failure (SPOFs) Redundant systems Complex, monolithic systems Service-oriented architecture 7 Netflix distributed SOA system 8 PREVENTING OUTAGES Change (not much you can do about this one) 9 OUTAGES 10

Description:
21. Each severity level should have its own standard operating procedure (SOP): . Devops model. • Devs need to own the systems they write. • Getting paged
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.