Blameless PostMortems and a Just Culture

Original by John Allspaw

Failure happens. This is a foregone conclusion when working with complex systems.

But what about those failures that come from the actions of individuals? What do you do with those careless humans? Fire them? Restrict them? This would be the traditional view of “human error: the “Bad Apple Theory” – get rid of the bad apples, and you’ll get rid of the human error.

A better approach may be to view mistakes and lapses with a perspective of learning.

A Blameless Post-Mortem

Make an effort to balance safety and accountability by investigating mistakes with a focus on the situation and decision-making process.

Allow engineers, without fear of punishment, to talk about:

what actions they took at what time,

what effects they observed,

expectations they had,

assumptions they had made,

and their understanding of timeline of events as they occurred.

If we “blame” engineers when things go wrong, we reduce trust, increase the likelihod that future incidents will be covered up, that management will be less informed, and that more incidents will occur.

Instead, assume positive intent. Any actions taken made sense to the person at the time they took it.

A Second Story

The idea of digging deeper into the circumstance and environment that an engineer found themselves in is called looking for the “Second Story”.

For example:

Rather than see the engineer as the cause of failure, the human error is seen as the effect of systemic vulnerabilities.

Rather than saying what people should have done, understand why it made sense for them to do what they did.

Rather than tell people to be more careful, constantly seeking out its vulnerabilities to enhance safety

Allowing Engineers to Own Their Own Stories

When engineers feel safe discussing mistakes they have made, they are often actually enthusiastic about helping the rest of the company avoid the same error in the future. Since they are now, in many ways, the expert, get them involved in coming up with remediation items.

How to enable a healthy Post Mortem culture

So, finally, some ways to enable a healthy Post Mortem culture:

Encourage learning by having blameless Post-Mortems on outages and accidents

The goal is to understand how an accident could have happened, in order to better equip ourselves from it happening in the future

Seek out Second Stories, gather details from multiple perspectives on failures, and don’t punish people for making mistakes

Instead of punishing engineers, instead give them the requisite authority to improve safety by allowing them to give detailed accounts of their contributions to failures

Enable and encourage people who do make mistakes to be the experts on educating the rest of the organization how not to make them in the future