Meltdown: Why Our Systems Fail and What We Can Do About It

E ARE IN the golden age of meltdowns, write Chris Clearfield and Andras Tilcsik. “More and more of our systems are in the danger zone, but our ability to manage them hasn’t quite caught up. The result: things fall apart.”

As systems become more complex, we are more vulnerable to unexpected system failures. In Meltdown, the authors examine a fatal D.C. Metro train accident, the Three Mile Island disaster, the collapse of Enron, the 2012 meltdown of Knight Capital, the Flint water crisis, and the 2017 Oscars mix-up, among other meltdowns, and discover that while these failures stem from very different problems, their underlying causes are surprisingly similar. These stories told here are a compelling look behind the scenes of why failures occur in today’s many complex systems.

Using sociologist professor Charles Perrow’s theory that as a system’s complexity and “tight coupling” (a lack of slack between different parts—no margin) increase the chance of a meltdown. In other words, these failures are driven by “the connections between the different parts, rather than the parts themselves.”

Some systems are linear and in these systems, the source of the breakdown is obvious. But as systems become complex, as at a nuclear power plant, the parts of the system interact in hidden and unexpected ways. Because these systems are more like a web, when they breakdown, it is difficult to figure out exactly what is wrong. And worse still, it is almost impossible to predict where it will go wrong and all of the possible consequences of even a small failure somewhere in the system.

As more and more of our systems become more complex and tightly coupled, what do you do? How do we keep up with our increasingly complex systems?

Oddly enough, safety features are not the answer. They become part of the system and thereby add to the complexity. And when something goes wrong, we like to add even more safety features into the system. “It’s like the old fable: cry wolf every eight minutes, and soon people will tune you out. Worse, when something does happen, constant alerts make it hard to sort out the important from the trivial.”

There are ways to make complex systems more transparent. For example, using premortems. Imagine in the future your project has failed. Write down all of the reasons why you think it happened. A 1989 study showed that premortems or prospective hindsight, boosts our ability to identify reasons why an outcome might occur and therefore deal with the potential problems before they occur.

We also should encourage feedback and sharing of failures and near-misses. “By openly sharing stories of failures and near failures—without blame or revenge—we can create a culture in which people view errors as an opportunity to learn rather than as the impetus for a witch hunt.”

Encourage dissent with a more open-leadership style. People in power tend to dismiss other’s opinions. Leaders should speak last. You have to work on the culture. Ironically, the authors note, introducing anonymous feedback actually highlights the dangers of speaking up.

Bring in outsiders and add diversity of thought. Outsiders will see things we don’t and are more willing to ask uncomfortable questions. Also in a more diverse environment we tend to be more vigilant and question more. When we are around people just like us, we tend to trust their judgment which can lead to too much conformity. “Diversity is like a speed bump. It’s a nuisance, but it snaps us out of our comfort zone and makes it hard to barrel ahead without thinking. It saves us from ourselves.”

Transparent design matters. We need to see what is going on under the hood. Being able to see the state of a system by simply looking at it can be an important safeguard.

These are just a sampling of the ways we can learn to manage complex systems. This doesn’t mean we should take fewer risks. On the contrary, these solutions—structured decision tools, diverse teams, and norms that encourage healthy skepticism and dissent—“tend to fuel, rather than squelch, innovation and productivity. Adopting these solutions is a win-win.”

We can make our systems more forgiving of our mistakes by thinking critically and clearly about our own systems. How many things have to go right at the same time for this to work? Can we simplify it? How can we add margin?

* * * Like us on Facebook for additional leadership and personal development ideas.