A Perfect Storm of Device Stress

We have a mature product that has been shipping for 15 years. There have been various revisions, but it has remained largely unchanged in terms of its functionality. Recently, a unit was purchased by a new customer. She discovered a sometimes-reproducible failure mode that had never been reported before.

This device has the ability to record and play back digitized human speech. It plays back messages in the same sequence in which they were recorded. We’ve sold thousands of these, hundreds from this particular production run, and we’ve never seen this problem before.

Being local, our customer stopped by our office with her suspect device. I could not reproduce her problem, so I assumed she was doing something wrong. It couldn’t have been the device’s fault. But, I’ve learned my lesson a long time ago -- you give the customer some respect and leeway when experiencing a problem. So I exchanged her device for a new one from the shelf. Much to my frustration, a few days later, she let us know she was experiencing the same issue with the new device.

I decided to go to her facility to troubleshoot the problem, still convinced she was doing something wrong. The failure was this: When playing back a message, sometimes the device would stop in the middle of the message, and reset. That is, the sequence was reset to message one. Watching her record and play back messages, I noticed several things: She spoke very loudly and very close to the mic; she has a high, piercing voice; the volume setting was at or near full volume; and the room in which we were had poor acoustics (echo chamber). I feared the problem might lie within the speech engine itself. If so, there might be little I could do about it.

After 30 years of troubleshooting one’s own designs, an engineer acquires an almost sixth sense about such problems. I don’t know how, but I came to the belief that the messages were resetting because the embedded controller, itself, was physically being reset –- as in a cold start! The only way to reset the embedded controller would be with a power spike, and corresponding voltage drop on Vcc. But, why would this happen?

I wasn’t pushing the limits on any of the components nor on any one spec. It was a combination of multiple components, all operating within spec, but each at the edge of min/max limits. The speakers from this production run drew more power than previous ones. The audio amp was pushing hard, and last but not least, the LDO regulator was operating at its minimum allowed value.

Cranking the volume, and shouting into the mic added to the grief. But why did her voice cause a problem but not mine? Because her voice was at a higher frequency than mine. Higher frequencies contain more energy -- they require more energy to reproduce, resulting in greater current spikes. It was the perfect storm, with components both electronic and human. Additional bypass caps were marginally effective. The fix was to desolder the LDO regulator and replace it with a beefier one.

Troubleshooting a flawed design should never end with simply fixing the problem and moving on. The engineer (or engineering team) should critically evaluate why the flaw slipped through the initial design-review-debug process. Steps should be added to the design process to more thoroughly evaluate the next design. But, how on earth can one catch problems with so many tentacles? Sometimes you can’t. You just hope that your follow-up support is robust and responsive.

This entry was submitted by Jonathan Eckrich and edited by Rob Spiegel.

Jonathan Eckrich has been president of Adaptivation since 1998. Much of his job experience is with designing industrial (VME-based) computer systems. He holds a Master of Computer Engineering (1985) and a Bachelor of Computer Science (1982) from Iowa State University.

It seems that in this particular instance it was not just pitch or volume, but actually voice power. Strange but believable. BUTR that is why we put in a greater margine. I have discovered that running at the limit always causes problems when you step just a bit over that limit. It happens almost every time, and so I don't design near the limit any more. Most of the problems that I have seen are with other engineers designs, which I was able to learn from their problems. Cheaper and easier, that way.

Don't feel bad 270mag, many analog guys aren't aware of these things either. I suspect the reason your voice didn't have the same effect is that the fundamental frequency in a typical male voice is almost an octave lower than for females. This frequency, determined by vocal cords, sets the repetition rate of the voice waveform. The rest of the energy in the voice waveform is mostly due to resonances of the oral and nasal cavities, which are similar in males and females. Therefore, the female voice carries more total energy because the "packet" of resonances is repeated more often (higher "duty cycle", if you will). Maybe TMI, but I just thought I'd explain my reasoning ...

I was hoping someone who knows analog more than this digital guy would comment. To me, analog stuff is black magic.

I shouted into the mic the same as our high-pitched customer, but I couldn't produce the same results. There were some very dynamic thingss happening involving frequecy, amplitude, dynamic impedance, etc.... ouch, my brain hurts.

In some cases, thorough troubleshooting of a bad design would ultimately mean re-designing it, so troubleshooting, as in this case, can end up as a last phase of operational test. But as the author says, how do you reproduce all possible user scenarios?

I'd submit that it probably wasn't the frequency so much as the amplitude ("loud") that bit you. I'm guessing the power amplifier driving the speaker is a class AB design, where DC power draw is related to output amplitude. This will become particularly high if the output waveform clips ... as it would when very loud. Further, the impedance of most dynamic speakers (again I'm assuming that's the case) rises with frequency, so it would actually take less power to drive. In any case, it's always a design mistake not to anticipate worst-case conditions ... in this case an audio signal that becomes square-wave drive to the speaker. The power amplifier DC draw will become quite high ... roughly half the total supply voltage divided by the speaker's rated impedance (which, by the way, is defined by EIA as the first minimum in the impedance curve above the low-frequency cone resonance). - Bill Whitlock, chief engineer, Jensen Transformers, www.jensen-transformers.com

It is funny how the spec sheet can look so good and you ignore it. I have overdriven LN326N +/- volt power regulator ICs in the past, as well as others. Sometimes your numbers don't add up. They only operate at 100 ma, but that seems so much until you stuff a board and keep the regulator on the edge.

Or, you buy tantalum caps from Ebay only to find they short all the time- but only after everything else is operating on the edge so you don't know right away where the short is.

Or the optical pathway seems perfect, but the power levels are way down unexpectedly. You first blame the electronics only to find many of the lenses are uncoated and you lost 4% per surface and there is nothing but having them replaced will do.

In my very first job out of college, I had to debug a design of an engineer who had been laid off. I struggled with it for a week. Finally my supervisor decided to tackle it himself. He fixed it in a couple of hours. In my embarrasment, all I could do was appologize. Fortunately, he was an understanding boss, and knew that trouble-shooting skills are acquired through years of experience.

This reminds me of the early television remote controls that used audio frequencies (by striking tone bars) to turn the TV on and off, and channel up/down. It turned out you could do the same thing by rattling your housekeys, although with less predictable results. I bet the engineers didn't foresee that "feature"!

It is a lot less embarrassing to find these unusual conditions before your customers do, but they collectively have unlimited time and resources to stress your product in ways you could never dream up! The best we can do is learn from these experiences and do better next time.

After 30 years of troubleshooting one's own designs, an engineer acquires an almost sixth sense about such problems.

This is as good as saying experience makes man perfect. In our work experience we do come across many debugging problems. We need to think in different angle to come up with the solution. We do learn many things in debugging and finding the solutions.

Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.