Why 'Plan B' often works out badlyBob Sullivan, Red Tape Chronicles, MSNBC
Engineers used to talk about guarding against the “single point of failure” when designing critical systems like aircraft control systems or nuclear power plants. But rarely does one mistake or event cause a catastrophe. As we’ve seen in Japan, disaster is usually a function of multiple mistakes and a string of bad luck, often called an “event cascade” or “propagating failures.”

In Japan’s case, early reports indicate an earthquake knocked out power to the nuclear plant’s cooling system, then the tsunami knocked out the backup generators. The third tier of protection – backup batteries -- were only designed to provide a few hours coverage – enough to get the generators repaired. But the backup backup plan didn’t account for the time it would take to complete generator repairs under duress, such as when Japan’s infrastructure had been decimated by an earthquake.

... Emergency drills and stress tests aside, Neumann said, there is no good way to simulate a real emergency and its unpredictable consequences. Making matters worse is the ever-increasing interconnectedness of systems, which leads to cascading failures, and the fact that preventative maintenance is a dying art.

“People just wait to fix things when they are broken,” he said.

... "Designing fault tolerant mechanics can more than double the complexity of a system," Neumann said, "and that can make the likelihood of failure much greater." It also adds to the likelihood that a backup system will be neglected by busy engineers.

... Then there's the key problem of interconnectedness, which makes circumstances ripe for an event cascade. The more systems are integrated, the more a problem in one can spread to another.

... One terrible irony of risk management is the better you do, the more your techniques will come under attack, Kabay said. The longer we go without a dangerous nuclear event, the more safety engineers are accused of overspending.

... And then there's the fundamental problem of what Kabay calls a "disjunction" between the people who decide how much money should be spent on safety measures, and the people who suffer the consequences of those choices. Often, a detached group of distant stockholders wants to save money, but it's the neighbors who will suffer if there's a radioactivity leak.

"Many times the managers who make the decisions know they won't be around when there's consequences," he said. The only way to fix the disjunction problem is with regulations and laws designed to fix consequences back on the decision-makers -- through fines, criminal liability -- so they share in the risk.
(21 March 2011)

Why Were We Unprepared for Japan?Harold L. Sirkin, Business Week
As demonstrated by Japan's recent disaster trifecta—an earthquake and tsunami quickly followed by a nuclear crisis—corporations too often find themselves unprepared when low-probability events shock their supply chains.

They're caught without a Plan B because of two fallacies: 1) the belief that because no one can predict the future, they should operate under the assumption that things will more or less stay the same; and 2) the notion that a supply chain represents a cost rather than an investment. Moreover, in the case of this particular crisis, many companies likely figured they didn't need a contingency plan since they buy so little from Japan. They forgot that the Chinese suppliers with whom they do business depend on goods and services from Japan.

The larger supply-chain problem stems from companies' focus on minimizing short-term costs rather than maximizing flexibility to meet future needs. This leads them to build static supply chains rather than dynamic ones. Such supply chains may save money today, but they carry hidden costs that can rise precipitously in the face of unforeseen events.

The crisis in Japan should force the global business community to see that traditional, static supply chains are a relic of the past. Companies should ready themselves to identify problems and react swiftly when a high-odds crisis occurs.

How? They can start by diversifying their supply bases.

... Companies can start making their supply chains more stable via localization.
(21 March 2011)

... The term “black swan” was coined and popularized by Nassim Nicholas Taleb, a New York University professor of risk engineering and author of “The Black Swan: The Impact of the Highly Improbable.”

People debate what qualifies as a black swan. Most alleged black swans turn out to have obvious precursors and warning signs — the Sept. 11 attacks included. Nothing comes out of the blue, truly.

... Disaster preparation requires a careful calibration of risk and a strong sense of what’s a reasonable level of caution. Society cannot protect itself from everything that conceivably could go wrong. Even with nuclear power, where safeguards are piled on top of safeguards, there is a point at which the operation becomes too expensive for anyone to attempt.

... “I think many of our systems do not operate as if things could go wrong,” Hunter said. “They operate as if everything will go right.”

Scientists have put together what they call a Probablistic Seismic Hazard Assessment that seeks to map the probability of a certain amount of shaking in any given time window. But “probabilistic” isn’t the same thing as deterministic.

“The problem is that, by design, you leave yourself open to the low-probability, high-impact event,” said Susan Hough, a U.S. Geological Survey seismologist who has written about the difficulty of predicting earthquakes. In Boston, for example, she said, “hazard maps say the hazard is low, and rightly so, but the potential risk could be enormous. Seismologists know this, but I don’t think the point is widely appreciated outside of the scientific community.”

... The disaster experts have a buzzword: resilience. You can’t stop the disaster from happening — the very nature of a black swan is that it catches you off-guard — but you can increase the speed and grace with which society bounces back.

“Think of resilience in terms of the old Timex commercial,” said Jack Hayes, director of the National Earthquake Hazard Reduction Program. “It can take a licking and keep on ticking.”
(x April 2011)

Calculating calamity: Japan's nuclear accident and the "antifragile" alternativeKurt Cobb, Energy Bulletin
Famed student of risk and probability and author of The Black Swan Nassim Nicholas Taleb tells us that in 2003 Japan's nuclear safety agency set as a goal that fatalities resulting from radiation exposure to civilians living near any nuclear installation in Japan should be no more than one every million years. Eight years after that goal was adopted, it looks like it will be exceeded and perhaps by quite a bit, especially now that radiation is showing up in food and water near the stricken Fukushima Dai-ichi plant. (Keep in mind that "fatalities" refers not just to immediate deaths but also to excess cancer deaths due to radiation exposure which can take years and even decades to show up.)

Taleb writes that it is irresponsible to ask people to rely on the calculation of small probabilities for man-made systems since these probabilities are almost impossible to calculate with any accuracy. (To read his reasoning, see entry 142 on the notebook section of his website entitled "Time to understand a few facts about small probabilities [criminal stupidity of statistical science].") Natural systems that have operated for eons may more easily lend themselves to the calculation of such probabilities. But man-made systems have a relatively short history to draw from, especially the nuclear infrastructure which is no more than 60 years old. Calculations for man-made systems that result in incidents occurring every million years should be dismissed on their face as useless.

Furthermore, he notes, models used to calculate such risk tend to underestimate small probabilities. What's worse, the consequences are almost always wildly underestimated as well. Beyond this, if people are told that a harmful event has a small chance of happening, say, 1 in a 1,000, they tend to dismiss it, even if that event might have severe consequences. This is because they don't understand that risk is the product of probability times severity.

... It is the nature of complex societies to continually underestimate risks. What we tend to do is to assign a probability to a possible harmful event and think that by assigning that probability we have understood the event and its consequences. It is a kind of statistical incantation that is no more useful than shouting at the rain. But because it comes wrapped inside a pseudo-scientific package, we are induced to believe it. If important men and women with PhDs have calculated the numbers, they must be reliable, right?

... So what should we do? Normally, we say we should try to make our systems more robust, that is, harder to destroy or cripple under extreme conditions. This seems altogether reasonable. But what if there is another choice? What if it is possible to build systems that thrive when subjected to large variations? Taleb points to such a possibility in an article entitled "Antifragility or The Property Of Disorder-Loving Systems." The text is difficult unless you've read his other work extensively. But look at the chart, and you will begin to get an idea of what he means by antifragility.

The relocalization movement should take note that as serious a thinker as Taleb has characterized a decentralized, artisan-based culture as one that is antifragile. It might be useful to figure out how to explain this advantage to interested audiences who are watching the complex systems of modern society crumble around them.
(20 March 2011)