Avoiding Human Error in Design

Note: The views expressed in this article are those of the author and do not necessarily represent those of his or her employer, GxP Lifeline, its editor or MasterControl Inc.

Failure Mode and Effects Analysis is particularly useful in analyzing the safety and quality of a hardware item design.

Any enterprise that is engaged in activities with the potential for public and employee harm should be encouraged, if not required to develop and implement an integrated risk management, safety management, quality management and environmental management system for the prevention of events with intolerable effects. Such a management system would establish various techniques by which to analyze the safety and quality of the designs of hardware and processes.

One such analytical technique is Failure Mode and Effects Analysis, which is particularly useful in analyzing the safety and quality of the design of a hardware item, for example a blowout preventer. A short and simplistic description of the analytical method is as follows. Each characteristic of the component is identified. For each characteristic, each mode of potential, credible failure is identified. For each credible mode of failure, the adverse effects of such failure are assessed. If any effect is intolerable, the design of the characteristic must be changed to eliminate the credible failure mode. If the design can’t be changed to eliminate the credible failure mode, something must be established to mitigate the effect of the failure – preferably something in the design, rather than in an operational procedure. (Care must be taken to identify credible failure modes that can exist due to the interaction of two or more characteristics in given states.)

As an example, before it was taken out of service, Therax-25, a medical device which could dispense a high or low dose of radiation to a cancer patient, repeatedly failed in a non-fail-safe state causing the death of twelve patients at various medical facilities. The device was designed with a steel plate which was intended to move into place between the radiation source and the patient, thereby limiting the dosage. Repeatedly, the plate did not move into place when there was an error in the computer key strokes. The manufacturer of Therax-25 did not perform an adequate Failure Mode & Effects Analysis to identify this failure mode and its effect. (Unfortunately, at the time that this was happening, in many facilities, the operator and patient were not in visual and oral contact with one another, as is the case today.)

Another such analytical technique is Hazard-Barrier-Effects Analysis, which is particularly useful in analyzing the safety and quality of the design of a process, for example a process for positioning and installing a blowout preventer. Again, a brief and simplistic description of the analytical method is as follows. Each task in the process is identified in sequence. For each task, each of the six “M”s that may be operative in the task is identified. (The six “M”s are [1] man, [2] machine, [3] material, [4] method, [5] measurement, and [6] mother-nature or man-made environment.) For each “M”, any hazard related to the “M” is identified and its potential adverse effect assessed. For intolerable effects, the process design must be changed to eliminate the potential hazard. If the hazard cannot be eliminated, multiple barriers must be established for the prevention of human error that can activate the hazard, as well as multiple barriers for the mitigation of the intolerable adverse effects of the hazard. (Not to get too technical but, again, care must be taken to identify hazards that can arise from the interaction of “M”s.)

As an example, at an electricity generation plant, in order to replace the insulation in a generator, the stator bars have to be removed. One process to remove the stator bars was as follows:

Use a prying bar to raise the turbine end of the stator bar out of its slot.

Use a chain hoist to further raise the turbine end of a stator bar out of its slot.

Insert a wedge under the stator bar. (The weight and return flexing force of the stator bar holds the wedge in place.)

Position a “tugger”, with a wire rope, at the opposite end of the stator bar.

Attach a sling to the end of the tugger rope.

Wrap the sling around the notch in the back of the wedge.

Increase tension on the tugger rope/sling assembly, such that the wedge is pulled toward the tugger, underneath the bar, thereby still further raising the stator bar out of its slot.

When the stator bar is sufficiently out of its slot, remove the bar by hand.

Remove the wedge from the slot by hand.

When a wedge stuck in the slot and could not be removed by hand or by hammering, the tugger was used to dislodge the wedge. The sling was wrapped around the notch in the back of the stuck wedge and tension was applied to the rope/sling assembly. The wedge broke and became a missile striking one of the crew members in the chest. He was hospitalized and fortunately survived his serious injury.

In this case, for this task, it was not recognized that the Material (the wedge) could be hazardous to the Man and, erroneously, no action was taken to prevent the hazard from being activated. The wedge wasn’t inspected for cracks prior to its use, recognizing that a crack could induce breakage and result in the wedge becoming a missile. There was no limit on the amount of tension that could be allowed to be applied to the rope/sling assembly, recognizing that excessive tension could break the wedge and cause it to become a missile. There was no cordoned off “safe standing” zone outside of the path of a potential flying wedge.

In this case, it was also not recognized that the Machine (the tugger and rope/sling assembly) could be hazardous to the Man. If the rope/sling assembly became dislodged from the notch in the wedge or if the rope/sling assembly broke, under tension, it could whiplash. In addition to the things noted above, there was no systematic periodic inspection of the rope/sling assembly.

There were other hazards as well. The utility management did not require that the process be subjected to Hazard-Barrier-Effects Analysis – task-by-task, M-by-M, hazard-by-hazard.

A very powerful analytical technique is Probabilistic Risk Analysis or Probabilistic Safety Analysis. This analytical technique is used to determine the ultimate effects or outcomes, called “end states” and the probability of each, given some undesired initiating occurrence. For example, given the loss of the primary power source on drilling rig, what are the possible outcomes or end states and what is the probability of each? To answer such questions, event trees and fault trees are used. The event tree shows the hardware systems that would come into play to respond to the undesired initiating occurrence and, based on the success or failure of each responding hardware system, an end state is arrived at. The paths from the initiating occurrence through each responding hardware system, with either its successful or failed response, to the end state, is called a “sequence.” Each sequence leads to an end state. Then a fault tree can be used to determine the probability of success or failure of each responding hardware system. Given the probability of success or failure of each responding hardware system, the probability of each end state can be determined. If an undesired end state has an unacceptably high probability, the design must be changed to lower the probability of that end state to an acceptable level.

Of course, in addition to management system’s elements which address the safety and quality of design, there must be management system elements to assure conformance to design.

Does your enterprise have personnel qualified to establish and implement such an integrated risk management, safety management, quality management and environmental management system with adequate logic, rigor and consistency? Is your enterprise experiencing repeated events with intolerable adverse effects? In law, there is a Latin term “re ipsa loquitur,” meaning that “the thing, itself, speaks.” In the examples given above, the results themselves speak, indicating the inadequacies in the management systems and analyses. Do your repeated events speak for themselves?

When decision-makers fail to recognize the need for such a management system (knowledge-based error) or when they implement a faulty management system (cognition-based error) or when they recognize the need but choose not to satisfy the need (value-based error), they’re making human error. We must recognize that human error in the design process, upstream of the process implementation – upstream of the initiating errors that occur on the shop floor or in the field (upstream of the reflexive-based error, error-inducing condition-based error, skill-based error and lapse-based error).

Ben Marguglio is a consultant for process improvement and the presenter of the acclaimed “Human Error Prevention” and “Root Cause Analysis” Seminars. Formerly, he was a multi-site corporation executive. He’s a Fellow (since 1973) of the American Society for Quality (ASQ) and is certified by ASQ as a Quality Engineer, Reliability Engineer, Manager of Quality / Organizational Excellence and Quality Auditor. He is the author of over 150 management and technical papers and presentations and three books, the most recent being Human Error Prevention. For additional information about Marguglio’s consulting and “Human Error Prevention” and “Root Cause Analysis” seminars, call 1-845-265-0123 or visit www.HighTechnologySeminars.com.