As well as the fact that often a failure may have more than one root cause. Carpet bombing may be the best way to improve the system as a whole. Often you need to get the part to print. Make the design better so the tolerances can be larger. And improve the tool so the part doesn't vary as much. Anyone can design a part that has no tolerances and has to be made to print +/- 0.000001. But the truly good design engineer makes a part that can be made to +/- 3 mm. or bigger.

I liked the article as well. Currently I am find myself trying to get to the bottom line of a lot of failures. I find your article intersting because some of the things you suggest not to do are exactly what we are doing. Our focus tends to start by understanding how big is the problem. Not because we don't want to fix everything but more from the point that we don;'t have unlimited resources and we want to get the most bang for the buck. We tend to try and get data and group the failures into different root causes. And then do focus on if the part is to print. Quite often the failures are caused because the part is not to print. Once the part is to print and the variability is taken out the system then the root cause failure of the design can be attacked and improved. However, if the parts are not capable and can't be to print, it doesn't matter how good the design gets because you will still have problems.

And yes I have been in all of the above situations. My favorite is getting a part in a box and being asked "why it broke?" Only the part is fully functional....

While all these were sound advice I personally still keep an open mind for problems that would be fixed quickly by one of these sinful actions. Countless times I have attached 100 probes and just measured data... and wala ten minutes later I know the solution.

It did bite me once when the issue was not design or a problematic part but rather EMI. See probes can make the EMI issue go away...

Dave: No doubt this could be a case of finger pointing at its finest. I think the points you made are critical for engineering teams to sit back, take a deep breath and dive into the problem rather than attack it without a plan.

These principles also hold true for failure analysis in electronics. I worked for a semiconductor company for years as a product and test engineer and recognize most of these scenarios as having happened at one time or another. One of the most interesting places in a semiconductor plant is the F.A. lab which is usually where customer returns are evaluated. And of course when parts started failing on the production line, the first place everyone tries to blame is the test set - it never occurs to them that their process might have shifted...

@Beth: Thanks for your kind comments. You're right that none of this is rocket science; it's just rational thinking. However, when parts break, people understandably get upset. Emotions can run high, and there may be a tremendous amount of pressure. As Ann points out, under these conditions, even intelligent and highly educated individuals may start to behave irrationally. The most important thing is to stay calm and focused -- especially when others aren't.

@GlennA: Experts, in particular, are susceptible to the temptation to jump to conclusions. The more experience you have, the more likely it is that a given problem resembles one you have encountered before. But that doesn't necessarily mean it's the same problem! Sometimes experience can be just as blinding as ignorance.

@TJ McDermott: You're right that sometimes time constraints can force you into a "kitchen sink" response. However, in these cases, it may be a good idea to continue investigating even after the "kitchen sink" solution has been implemented in order to determine the real root cause. Who knows? Maybe you can make yourself look like a hero for a second time by coming up with a cost savings when you realize that 2/3 of the kitchen sink solution was unnecessary.

I agree with Beth. Dave, thanks for such a clear overview. The principles you discuss here seem simple and obvious in hindsight, yet somehow can be easily forgotten even by well educated and well trained pros. They parallel\ some of the basic electrical system troubleshooting principles I learned from one of my engineer buddies years ago, which I apply mostly to my multi-component stereo system.

Although I haven't been involved in formal failure analysis, I have often been called to troubleshoot problems. Often the most senior person involved 'declared' what the root cause was. After I finished my troubleshooting, I had often proven that the 'expert' was wrong. Seniority doesn't automatically mean that you know all of the intricacies.

A few weeks ago, Ford Motor Co. quietly announced that it was rolling out a new wrinkle to the powerful safety feature called stability control, adding even more lifesaving potential to a technology that has already been very successful.

It won't be too much longer and hardware design, as we used to know it, will be remembered alongside the slide rule and the Karnaugh map. You will need to move beyond those familiar bits and bytes into the new world of software centric design.

People who want to take advantage of solar energy in their homes no longer need to install a bolt-on solar-panel system atop their houses -- they can integrate solar-energy-harvesting shingles directing into an existing or new roof instead.

Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.