PagerDuty commissioned a study across over 10,000 companies over 100 different segments.

Remember, memes are another way of evolving across generations. This happens in the world of Snow Crash, but it can happen in your organization as well.

If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the Normalization of Deviance effect. In this case, we start to
accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

If we don’t treat every outage or alert as something to learn from or something to improve, we run the risk of the Normalization of Deviance effect. In this case, we start to
accept alerts or degradations as acceptable. Our standards suffer. We let things slip through the cracks.

Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need
five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that if your page load time
increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will increase by 50 percent.

Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Let’s make sure that we are setting the proper expectations. We don’t want to just expect five 9’s of reliability because “well, five is better than four.” Why do you need
five? Have you tied your metrics to a business outcome?

Likewise, your speed metrics shouldn’t be “faster than last month.” And beware of inaccurate extrapolation. You might have data suggesting that if your page load time
increases by a second, conversion drops by 50 percent. But that doesn’t mean that if you reduce load time by a second, conversion will increase by 50 percent.

Correlation doesn’t always equal causation, and the same numbers don’t move the dials in both directions.

Don’t over-design systems. Resume-driven development is almost always a recipe for on-call disasters.

At the heart of every complex resilient system is the hubris that someone believed they could predict everything that could go wrong. Fate, and the internet, laughs

ask how the on call is feeling during stand ups. give them the opportunity to mention they might be burning out.

volunteer to help as an incident commander (what’s that? Maybe we should have them!)

You want to get all the right people on the call as soon as you need to…but you also want to get them OFF of the call as
soon as possible.

These might seem obvious, but if they’re so obvious, I assume you’ve done them already?

Link for this presentation:

HTML code for embedding:

Share on social media:

Richard Dawkins described memes as being a form of cultural propagation, which is a way for people to transmit social memories and cultural ideas to each other. Not unlike the way that DNA and life will spread from location to location, a meme idea will also travel from mind to mind.

Changing the mindset of any organization to a more humane approach to ops - including awareness of alert fatigue, burnout risk, and proactive vs. reactive approaches - can seem impossible.

In this talk, I will discuss how the very DNA of an organization can evolve through the use of actionable communications from all levels - management, strategy, and practitioners. The “virus” of humane ops will infect your organization, providing a more sustainable approach to on-call, incident resolution, post-mortems, and more. There also will be copious references to the Neal Stephenson classic novel, Snow Crash.

After this talk, you will have ideas of practical approaches to effect change in your organization, regardless of your level of influence. While not every group will use the same “viruses”, you will take away a good understanding of where to get started as Patient Zero.

Resources

The following resources were mentioned during the presentation or are useful additional information.