Intro Monitoring is important you can’t manage what you haven’t measured Monitoring is hard just because it’s up doesn’t mean it’s working Distributed systems are complex Reasoning Gestalt The whole is greater than the sum of it’s parts No single tool will fit the needs of everyone Better to build a cohesive whole out of orthogonal pieces Critical elements If it moves graph it, if it’s important alert on it. Metrics Alerting Tools Sensu Pagerduty Graphite Gdash/Graphiti Logstash -?