Petroski on Engineering: The Normalization of Deviance

Last spring and summer, while oil gushed into the Gulf of Mexico, much of the news coverage following the fatal explosion on the drilling rig Deepwater Horizon focused on the blowout preventer located a mile below the surface. As its name denotes, the device's function was to prevent exactly the kind of blowout that did occur. It did not work properly because some pipe from the runaway well was forced upwards into the preventer and jammed the mechanism.

Over a 25-year period, a pre-accident survey had found blowout preventers on about 15,000 other wells had to be activated in an emergency only 11 times. Unfortunately, in five of those cases, the preventer failed, as it did in the Gulf. This 45 percent historical failure rate did not jibe with the 0.07 failure rate claimed during the government-mandated testing of blowout preventers.

Even as lax oversight and testing procedures were being called into question, the oil industry was using this low failure rate to argue for less frequent testing of the complex system of valves and rams that were the last line of defense against a blowout. It was estimated that reducing testing requirements could save oil companies almost $200 million per year.

A blowout preventer is also an expensive piece of equipment to maintain, with an estimated cost of $700 per minute incurred during the time that drilling had to be stopped while the device was disconnected, hauled to the surface, repaired, lowered back down, and reattached to the wellhead. The economics of the situation clearly argued against a conservative maintenance regimen and promoted a culture of risk-taking.

In the case of the oil company BP, whose Gulf operations were directed out of Houston, the culture that developed around deepwater drilling operations was not unlike that of another Houston-based technology. At the outset of the space shuttle program, the total-failure rate of shuttles was estimated by engineers to be 1 percent and by managers to be 0.001 percent. The Challenger accident proved the actual failure rate then to date to be 4 percent, and after the Columbia accident, it still stood at close to 2 percent. Repeated negative experiences with eroding O-rings and shedding insulation were not heeded as warnings. They were taken as signs of the robustness of the space vehicle and promoted a fault-tolerant culture that allowed for what has been called a "normalization of deviance."

Normalized deviance has also plagued the oil drilling industry, where at least some companies have allegedly let the financial bottom line dominate decision-making. Just as NASA managers were emboldened by two dozen successful shuttle flights before the accident with Challenger and, after the hiatus, another 87 successful missions before the disintegration of Columbia, so the low incidence of needing to call upon the blowout preventer in an emergency promoted a sense of bravado in the operation of offshore oil rigs.

Just reading Professor Petroski's post reminded me of watching those heart-wrenching images of oil gushing into the gulf and I'm glad it did. Truth is, once disasters like the BP oil spill or Japan's Fukushima are behind us (or at least out of sight in the media), the general public tends to forget and move on, which lets the corporate conglomerates get away with the human failure that Petroski's describes--the finger pointing and internal jockeying for where to place blame. Seems to me that dollars could have been well spent solving the mechanical problem--that is, redesigning or reengineering the blow-out preventor to operate more effectively no matter that it was a complex piece of machinery. Probably would have been far less painful to the bottom line then the PR and environmental recovery effort that befell them after the disaster.

Excellent analysis, and the Challenger example spotlights the psychological aspect of the "normalization of deviance" culture which works its way into the engineering mindset in situations where the failure rate has previously been so low that it's easy(easier) to coerce the engineers responsible for ensuring safety that things have been OK for so long, why should this time be any different. In any life situation, there's pressure to conform to the group, and that's exploited in situations such as those described here. That's why when the disastrous consequences come, they seem to be outliers, but in reality they're not and are to be expected.

It is interesting to draw parallels between the Space Shuttle and oil drilling. While deep water drilling is much more complex than most other drilling, the Shuttle is something altogether different and more complex. In the early days of rocket development, there were many failures. Then, expendables became very reliable, although there are still occasional failures. The thing that differentiates the Shuttle Program is that it invoives manned flight and that it was an attempt to present space flight as a routine, repeatable activity like airline travel. It most decidely is not. Between the high cost and high visibility of the program, failures are magnified. We accept far more danger when we drvie a car.

More people died in the Deep Horizon accident than in the Challenger accident. In addition, there was significant environmental damage in the oil rig disaster than in the Shuttle accident.

Another excellent article by Professor Petroski. In a couple of other recent threads on this site there has been some discussion of groupthink, and the kind of treatment which engineers who challenge it can expect.

When I worked in quality, I often encountered the argument, "We've accepted this out-of-spec condition before and everything worked out ok, so we might as well accept it now." My response was always, "If you're playing Russian roulette and you pull the trigger and no bullet comes out, does that mean no bullet will come out the next time you pull the trigger?"

Excellent point, Dave. I should note that I spoke with Roger Boisjoly after the Challenger disaster. (He was the one engineer who resisted going ahead with the launch, and lost his job as a result.) I also attended the first Washington, D.C. hearing of the Rogers Committee. That's the group where the late physicist Richard Feynman famously dipped an o-ring in ice water to show how brittle it became. I could go on; it was a fascinating experience.

The Columbia 'accident' may have been preventable; I think it was the book "Comm Check". Several engineers' / groups' concerns, if acted on, could have detected the damage.

The Challenger ' incident' was preventable. I think that was the book "The Challenger Launch Decision". The Shuttle operational limits were something like 40F to 99F. So when ice was observed on the vehicle, the engineers' recommendations against launch were well founded.

Before that was Apollo 1, when engineers argued against a 100% oxygen test, on top of many poor design features.

In each case, the advice of the engineers (experts) was ignored or over-ruled. I had much more respect for NASA before reading these books.

Nice article. Seems to me that if the blowout perventer's actual performance included a real-world 45 percent failure rate -- even while tests indicated an 0.07 percent failure rate, this would be grounds to call a foul and look into whether the blowout preventer system was adequate protection against catastrophe. Is this an example of regulators asleep at the wheel?

Thanks for a great article. I agree with Rob, you'd think that it's the scarier real-world numbers that would be paid attention to, not what is supposedly the norm based on a few tests.

But the numbers also need to be related to actual people and actual harm, not thought about abstractly. If the statistical likelihood of something occurring is greater than zero and that occurrence has fatal results, then that risk is too high. For example, I once took a prescription medical for allergies that started getting bad press for fatal heart attacks. When discussing this with my doctor he said "but the risk is only 2%." Uh, right, but what if I'm in that 2%? No thanks.

Good point, Ann. What is an acceptable risk if the result is a fatality? I think there are some areas where we accept risk readily. One is driving, mentioned in an earlier comment. Most of us accept that risk on a daily basis. Another, also mentioned in an earlier comment, is exploration. Our current space program is amazingly safe compared to earlier human exploration. Throughout history, we've always accepted high risk for exploration. I agree with you on allergies. No risk of fatality is acceptable to reduce allergies, partly because there are so many alternatives with no risk of fatalities.

I wasn't implying that we should live in a perfect world with zero risk of bad things happening. I'm no Pollyanna. And yes, I accept the risk of fatality when driving. But that's me choosing to do so (to some extent; driving is pretty well required much of the time, especially in non-urban areas). I do not choose to drive a car that's potentially fatal, or use a medical device or prescription medicine that could kill me. If I knew about those possible failures ahead of time, I might be able to make different choices, either a different car or medical device or none of the above. If I don't know, then something's wrong. Why should there be so many different electronic doodads, whether automotive or medical, or medications, for example, that require so much time and energy being regulated, all in the name of consumer choice? It looks to me like commercial interests have trumped all others in this regard.

Ann said: I do not choose to drive a car that's potentially fatal, or use a medical device or prescription medicine that could kill me. If I knew about those possible failures ahead of time, I might be able to make different choices, either a different car or medical device or none of the above.

Bad news - You have chosen to never get medical care in any hospital in the world. Any medical device has the capability to fail and most in a hospital 'can' lead to death. Endoscopes and surgical devices can become contaminated, monitoring systems can fail, cath labs can shut down in the middle of a procedure, patient lifts can drop patients (actually, all of that and much much more has happened).

And unless you're a multi-millionaire, you don't even get to choose which medical device you get, let alone access to the information you'd need to make the decision. And that would assume useful information exists - it basically doesn't.

I think that's a major point - I never saw any mention of post-design failure consideration in engineering school. And that's a main reason I got my degree; before then, I dealt with the results of post design equipment failure and saw how pathetic the designs were related to failure management/prevention.

streetrodder, I completely agree with your characterization of medical risks. Hospitals terrify me and I avoid them at (nearly) all costs, given the awful infection and often fatal mistake statistics. And I'm very aware that the better stuff--doctors, procedures, medical devices--are available to the rich and not to me. Thanks for pointing out this big blemish on the American ideal of equal access to quality health care. It doesn't exist.

These examples are rather spectacular and easily draw our focus. I'd like to offer a slightly different perspective. Part of my current job is to manage medical equipment recalls for a VERY large healthcare organization. Why would an engineer do this? Because someone needs to understand the technology,its failure modes and how those failures will affect patients (which I'm one as well).

I review over 2000 identified medical device failures/hazards per year. It's simply too many to effectively track each one to it's final resolution. I have to manage the risk - i need to triage the issues for impact and where I can be most effective. As much as I'd like to track every one down and ensure all of our hospitals can manage the problem, it's simply not possible. That means there's a chance that a patient will be injured or die because I didn't follow up on the 'right' issue. It's terrifying, but that's simply the way it is. No company can perform perfect risk elimination. In the real worlld, we have to perform risk management, we have to focus limited resources where we think they will do the most good. And sometimes, we get it wrong.

No engineering endeaver has zero risk. For that matter all of life has risk. All risks need to be "appropriately" balanced. Most Americans want cheap energy (many want oil and it's downstream products such as gasoline) and low foreign dependance; however, everyone wants zero risk as well. It cannot be at both extremes. Clearly, nobody wants a disaster, but how do you stop one step before disaster or otherwise mitigate the risk enough to prevent it?

I believe the low reported blowout device failure rates are based upon testing the device to work as designed. The 45% failure rate includes the fact that many horriffic blow-outs will not be effectively released safely by a device that "meets specification". Some blow-outs are no doubt horriffic and practically impossible to safely control. This indicates that either A) the specification/testing/DFMEA needs to be reviewed, B) the higher reality of risk needs to be accepted, or C) we need to accept abandoning that high risk activity. I would advocate a combination of A and B with some mitigation.

We daily accept the risk of driving our car. Some of us daily accept the higher risk or riding a motorcycle. These risks can be magnified, or mitigated by the way we drive (i.e. - drive like a daredevil or drive defensively), and mitigated by car safety improvements. All of life is a risk. If I only walk, run, or ride a bicycle to avoid driving a car, I may get hit by a car or have a heart attack, AND I will need to limit my travel which could risk reducing my income and access to extended family . . . another type of risk trade-off.

Good points, David. There is risk in all areas of energy production. I would guess that coal mining comes with a higher risk of death by injury or illness than oil. I've always heard that the energy source with the lowest risk of harm is nuclear energy. Not sure if that has changed since the growth of wind, solar and geothermal.

I would agree with the professor about the issue of the Blowout Preventer. While the drilling industry culture may be a major factor, what the issue argues for is a "digital twin," that is robust sensoring of the blowout preventer so as to know the complete state of the blowout preventer by having a digital virtual equivalent. I use the very example of the blowout preventer in my new book, "Virtually Perfect."

Where I have a problem is with drawing the equivalence to NASA based on geography. Hindsight is an amazing capability. Every launch has pressimistic engineers that would advise not to launch. There is little penalty for predicting disaster and being wrong.

There has been much written about the shuttle disasters and the causes that led up to them. All of it is hindsight. There may have been some misguided and even bad decisions, but there was never a sense of recklessness. Launching people into space is an inherently risky business. We hold it too much higher standard than we do commercial air travel. When something happens, Congress is delighted to search for scapegoats.

I don't know where the professor got his statistics about the estimate of expected failure rate by engineers and managers before the start of the space program. How anyone could have taken seriously a probability computed to three decimal places at that stage is a triumph of statistics over common sense.

Drawing an equivalence between the oil industry with a 45% failure rate and NASA with a 2% failure rate on a task with magnitudes of greater difficuly and unknowns is unfair. I think we should marvel at NASA's successes.

Professor, good job on highlighting the problem of the oild industry regarding blowout preventers. Bad job on comparing that to NASA.

@Michael Grieves: I'm pretty sure Professor Petroski got the story about the differing Space Shuttle reliability estimates by engineers and managers from Richard Feynman's appendix to the Rogers Commission report, which can be found here. Feynman concludes: "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled."

My sense of the Rogers Committee hearings, and I admit this is purely subjective and based on my limited experience attending the opening hearing and also talking to some of the participants, is that without the presence of Feyman, who incidentally was not well at the time and would die two years later (not really relevant but an interesting side point), I believe the report would have been pretty much a whitewash or, more precisely, a useless exercise. Feynman's input gave it some meat.

Many of us believe that it is long past the time to eliminate the Industrial & Government Exemption for Professional Engineering Licensing. Looking at the many failures & Disasters, caused by non-technical persons, over riding an engineering decision, the problem has a solutioon. If a P.E. had to "sign-off" on the MN bridge, the Challenger, the Gulf Spill and many others - without over-ride, they would not happen.

I have a letter from Ford Motor Company, concerning a Ford Explorer, which states: "We know all about your problen. It's an uncorrectable built-in Design Flaw, which you're going to have to live with". And they pinned the problem on Firestone. If an engineer had rsponsibility, not Marketing, the problem ccould have been eliminated.

Once again we see the "method of selected data" at work, producing wildly different statistical results. And, of course the bottom line rules decisions! In many industries it always does, sometimes with terrible results, other times with fairly expensive safety systems. Several of my designs could have been much less expensive if safety had not been a consideration, but OUR "bottom line decision" was that it was far cheaper to provide the safety system than it was to kill people. OUr customers did appreciate the safety systems, even when they understood the cost.

For the oil companies, a working blowout preventer can be "cheap insurance" against an event that does not happen if nobody makes a mistake, and no other equipment fails.

MY suggestion for a cure for the type of decisions that led to the BP blowout is to raise the fine to a point well above the cost of assuring that the blowout preventer will function as needed. My understanding is that BP was quite aware that the device was not able to function correctly, but they chose to let that slide. Then when they made a number of other cost-cutting bad choices the well did blow. The worst is that they had the information that warned them that it was likely and they chose to go ahead anyway.

Perhaps a ten or twenty year ban on BP doing anything in the Golf of Mexico would convince others to be a bit more careful. Yes, "that would certainly be a harsh lesson, but fools will not learn any other way." The quote is not original with me.

A very good article. Many of us have seen situations with similar 'cost centric decisions' - except in most cases, human lives are not on the line.

The "This 45 percent historical failure rate did not jibe with the 0.07 failure rate claimed" statement looks like a component failure rate vs. a system failure rate. Since there were 'only 5 system failures', I am sure that Management/PR folks have contrived reasons for discounting them - that way, there is no need [in their own legendary minds] to investigate the failures. The contemporary process for dealing with major failures/tragedies is [1] find someone to blame and [2] release a 'things to do/fix list' that sounds impressive. That way, things like root cause analysis does not get in the way of "progress"

Great article and follow up dialog. From my perspective, there is a basic human behavior behind the deep water horizon blow out - complacency. As humans, we tend to increasingly disregard / down play risks as our distance from the last personal exposure to the risk increases. Is it a coincidence that the last major blow out in the gulf (Ixtoc 1 in 1979) happened 32 years ago, prior to the professional careers of most decision makers involved with deep water horizon? How many parents have experienced whooping cough or polio? Look at the resistance to vaccinating children against these diseases that has been growing in recent years. Same situation with the lessons associated with the great depression and excessive leverage.

To me, the response is to recognize this human tendency, and replace the searing pain of an actual event (like a blow out) with an equally painful consequence of failing an inspection, or being found to have a deficient safety program. This needs to be sufficiently painful (fines exceeding $500M, decision makers go to jail, disbarment from operating for an extended period) that maintaining a safe operation becomes embedded in the culture of an industry and remains immune to being a factor in a cost benefit analysis.

Then we will see industry investing in safety, be it technology, equipment, or people since the economic survival of a company and personal liberty of its leadership are at stake.

@ Stephan B. The reason that many parents are avoiding immunizations is primarily not complacency but an irrational fear, fanned by an ignorant media, that somehow these immunizations cause other problems, such as autism. At least that is what I have read is the cause of the children not being vaccinated. One option that might change their actions would be to forbid nonvaccinated kids to register or attend public schools.

Of course there probably are also a few who are just to lazy to do it, but I am not certain that laziness equates to apathy.

You're right, William, immunizations save lives. The statistics are clear. While tons of parents are convinced their kids have autism because of immunizations, the data from a number of studies says otherwise. As for schools refusing children who have not had immunizations, it's certainly common in private schools.

Rob, I would say that is more like dozens of parents blame immunizations for Autism. Of course it may be related to the hungry lawyers eager to sue companies that have lots of money. I do smell a cause and effect there. Besides, how could anyone ever prove it one way or the other? Urban myths are that way. MAD Magazine had an article about it a few months back. (About urban myths, that is)

Yes, maybe it was just dozens of parents, William. Their stories are compelling. One day their kid is normal. They take the child for an immunization and boom, their child is autistic. The only change was the immunization. Then they find out there is mercury in the immunization, and since mercury is a neurotoxin . . . well their conclusions makes sense -- except scientific studies say it's not true.

Rob, I have heard that assertion, and the counter arguments. The fix, clearly, would be to remove the mercury from the vaccine. That may be difficult, and my chemistry skills are not adequate to understand exactly what the mercury is there for. It could be a mechanism something like the peanut alergy, where two things work with eachother and cause a problem. Not quite like a catalyst type reaction, but a similar effect. My thinking is that if the correct facts were gathered and correlated that a mechanism of causing the problem would be discovered, and that would be a big step towards prevention. Cures are a different story altogather.

Hi William. By 2001 the preservative Thimerosal, which is made from mercury, was taken out of most vaccines used in North America. It's been replaced by non-mercury compounds. I'm not sure what they use now.

I concur with your recommendation of punishment. Too often when punishments are monitized, the offender(s) can easily pay, or avoid the fines effectively avoiding punishment. By using a punishment directly related to the crime (i.e. taking away BP's livelyhood in US territory), will cause the needed management purge that will allow them to learn from the mistake or fold and let better companies take their leases.

Industrial workplaces are governed by OSHA rules, but this isn’t to say that rules are always followed. While injuries happen on production floors for a variety of reasons, of the top 10 OSHA rules that are most often ignored in industrial settings, two directly involve machine design: lockout/tagout procedures (LO/TO) and machine guarding.

Load dump occurs when a discharged battery is disconnected while the alternator is generating current and other loads remain on the alternator circuit. If left alone, the electrical spikes and transients will be transmitted along the power line, leading to malfunctions in individual electronics/sensors or permanent damage to the vehicle’s electronic system. Bottom line: An uncontrolled load dump threatens the overall safety and reliability of the vehicle.

While many larger companies are still reluctant to rely on wireless networks to transmit important information in industrial settings, there is an increasing acceptance rate of the newer, more robust wireless options that are now available.

To those who have not stepped into additive manufacturing, get involved as soon as possible. This is for the benefit of your company. When the new innovations come out, you want to be ready to take advantage of them immediately, and that takes knowledge.

Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.