The Least-Obvious Signals Sometimes Bite You

When is a digital circuit not a digital circuit? When somebody forgot that ultimately everything is an analog circuit, and things like L, R, and C combine in the most insidious ways to derail a design.

Fortunately, some digital things just happen to scream, "I'm inherently analog!" For example, an LVDS interface bus will do that, particularly as you go from circuit design to layout and have to start considering the dielectric constant of the fiberglass material out of which you make the PCB and the widths of all the copper traces you put down.

Other digital circuits can be more subtle, however -- particularly if they're perceived to be "low speed," or they're deceptively single-ended signals. Those can be easy to overlook as being truly "analog" at their root. Open-drain signals are particularly sneaky in this regard, since unless you're having some specification called out to you as a legitimate pull-up value (as may be the case if you're dealing with an open-drain clock or data line on a bused serial communications system on a cable with real-world L's and C's staring you in the face).

It can be mighty tempting to say, "Just slap a 10K resistor down there and be done with it, since we have lots of those on the board already and there'd be one less reel to load on the pick-and-place machine when the boards are built." Often you can get away with this. But sometimes you can't.

This was a case we encountered some years back with the Net 186 design, a popular processor and networking chip evaluation card put out by AMD.

AMD's idea was simple: They wanted to penetrate the nascent Internet-connected device market using an Ethernet controller chip (the Am79C961A) they made for PC-ISA cards and a 80186 microprocessor variant (the Am186ES) they made for embedded control systems. And for just about all of the Net 186 cards they ever shipped, the system performed quite admirably.

Then again, the bulk of their users were implementing designs where the Ethernet frames sent to the controller were small and the DMA system never got that heavily stressed. Or their customers were using the core of the reference design as a starting point for their own designs, and enough things were getting changed before anything went to market that little things such as the values chosen for pull-up resistors never caught up with them.

Where the Net186 card began to break down -- and first came to our attention, since we had done some of the software tools and written a few drivers for the system -- was when users began to beat hard on the Ethernet controller chip and send large frames through it (or send a lot of them back-to-back).

To understand where this breakdown occurred, you have to understand that the Am79C961A was designed to be a chip on a PC-ISA card looking back at a host ISA bus. As a consequence, a few of the signals it wanted (and had to send) were subtly different from those going to and coming from the bus of an x86 processor. The bus of an early x86 processor and an ISA bus are similar, but not identical. As a consequence, between the Am79C961A and the Am186ES sat a small 22V10 PAL to implement the glue logic to make one work with the other.

Nice Sherlock, Eric. It brings up an interesting point. If this AMD product worked for most of its intended use, is that enough? Or, should the product be designed to hold up to a seldom-used difficult application?

Thanks, Rob. To answer your question, yes, in 99% of all applications, I'm sure this AMD product worked fine. It was, after all, an evaluation board for a MAC-PHY chipset and an embedded x86 processor, so it never went into anything (at least of which I'm aware) that didn't have another engineer as the targeted end user. And eventually, once we'd found this little bug, AMD published an errata sheet for it, and word went out pretty quickly among their FAE's to drop that 10K resistor down to more like 1K, at which point everything worked fine in all circumstances.

For that one customer (and he happened also to be a customer of ours) who found the problem first, it was an insidious bug to chase, though. It landed in our lap mainly because we were the designer of an in-circuit emulator for the ES186 processor. And because the emulator added just enough capacitance to the line on the part that wasn't pulled hard enough, it looked for all the world like a bug with the emulator, since it was the addition of the emulator that caused DMA transfers to stop working properly. Then the fickle finger got pointed at the driver routines for the MAC/PHY chip (once the behavior had also been seen on non-emulator-equipped boards), since it looked for all the world like a software bug. After all, the bad behavior only arose on particular combinations of reads and writes to the bus and/or particular combinations of 8-bit versus 16-bit activities. And these failures were all so consistent that the customer's immediate reaction was to rule out "marginal hardware." Either way, since we wrote some of the driver routines, made the RTOS he was running, and designed the ICE; we were on the hook for it until proven otherwise.

Fortunately for us, this wasn't the first time we'd encountered a situation where a marginal board design failed on emulator attachment. (If memory serves, I described something similar years ago in another Sherlock Ohms piece.) So we knew where to start looking for trouble, and we had an excellent FAE supporting us on the inside at the time as well. (Chip Freitag is among AMD's sharpest applications people, although these days he's mostly writing firmware for them and doing a lot less directly with customers.)

In any case, this was one of those "1%" sorts of problems, and probably most of the people who used DMA transfers on these boards never knew this bug was down there.

It's an interesting case study, however, in how people use component values that "don't matter much." If you look at a design I've done, it's readily identifiable by the number of 4.7K resistors in it for all of those "doesn't make much difference" components.

Partly that's because I cut my teeth in electronics as a precocious Cub Scout in the early 1970's who go so turned on by the Cub Scout crystal radio project that for the next half dozen years every penny of his 50 cent a week allowance plus whatever money he could make mowing lawns (when he got a little older) went straight to the Radio Shack store that finally opened within the bicycling radius of a nine or ten year old. And not having much money, I always bought the "Surprise Pack" offering -- which was all of Radio Shack's surplus and tended to be well-stocked in 4.7K resistors.

Add to that the fact that the only real "coach" I had in electronics in the early days was Forrest Mims III, who wrote all of the $1.25 Radio Shack books on how to build op-amp or digitial circuits, and *he* used a lot of 4.7K resistors.

4.7K was also a good value for the kid with bad astigmatism who found 1K resitors (with brown and black color bands next to each other) a lot harder to read than 4.7K (where it's hard to miss the combination of violet, yellow, and red).

It was only when I got to college and started to learn electronics in any structured setting that I realized that 2*pi*4.7 equals something very nearly 30, so with common values of capacitors of the 3.3x10^x F variety, you can get to some very round powers-of-ten numbers for RC time constants.

So for all I know, somewhere out there in the world, there may be somebody looking at one of my designs today, seeing 4.7K ohms someplace, and thinking, "Did the guy who designed this run the numbers, or did he just eyeball it and say, '4.7K sounds about right?' "

I'm betting the guy at AMD who designed the Net186 board *didn't* buy a whole lot of Radio Shack "Surprise Packs" or build a whole lot of circuits from the Forrest Mims III "Engineers Notebooks" as a Cub Scout, either.

And to come back around and answer the question you asked -- I'd have designed this board to have a 1K resistor on that line from the beginning. There were other 1K parts on the board, so it's not like anybody would have had to load another reel on a pick-and-place robot to get them down there. And at 5V, the difference in current consumption for the board between 10K and 1K would have been 4.5mA, which is less than half the current than ran through the smallest LED on the board (and a whole lot less than an embedded x86 processor draws).

Philosophically, on an eval board, I would tend to design for the guy in the 1% category, simply because most times it doesn't cost that much extra, it's a design that's not going to mass manufacture, and when that 1% guy does come along, getting him back on track can take 100% of your applications engineering time. The 20 hours or so we spent chasing this problem down (and it was insidious, since it wasn't failing consistently, and it changed behavior depending on whether the first instruction after DMA release caused a read or a write from/to the processor bus) would have bought a whole lot of 0603 resistors.

Evaluation boards are a special subset of product, and should ALWAYS work, for ALL potential cases, ALWAYS. At least, IMHO.

The perception is that if the chip manufacturer cannot manage to get their own product to work (a simple demo board), how could I possibly get it to work in my own product. Eval boards are seen as "golden" designs and are often copied verbatim into OEM designs, so a flakey eval board is an excellent way to lose design wins (and possibly not just that one chip, but for a whole manufacturer's portfolio).

Kudo's for a good sleuth story. Sounds like the issue was handled well and AMD did the right thing passing the errata along. As a hardware Engineer, I'm used to debugging SW, so it was nice to read a story from the other (SW) perspective.

Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.