The U.S. military's growing dependence on the proper functioning of computers has caused it ti specify that each computer chip it purchases must first pass reliability testing. After all, it doesn't want to risk having a dud guiding the warhead aimed at an enemy or falsely signaling the approach of nonexistent hostile missiles. However, detecting "sick" chips is difficult. In fact, according to researchers, some of the very tests meant to diagnose congenital abnormalities in a chip's integrated circuitry may themselves induce damage that will cause an apparently healthy chip to fail prematurely.

Clemson (S.C.) University electrical engineer Michael Bridgewood is exploring this problem, particularly as it is likely to affect the "next generation" of chips -- those whose components will be on the order of a micron or less in depth, width or height. (There are about 25,000 microns to the inch.) These Very Large Scale Integration (VLSI) devices, now under development, might depend on thin-film insulators only 100 angstroms (1/250 micron) thick to keep densely packed chips from developing short circuits.

Bridgwood has demonstrated that such thin insulators can be ruptured by a short current impulse having a voltage only a fraction of the 3,500 to 4,000 volts needed to create a detectable static-electricity shock in humans. Such a pulse could be delivered unintentionally and unknowingly to a chip during its manufacturing through contact with either charged humans or equipment.

Although most commercially available chips are also potentially susceptible to this type of damage, future chips will be far more so, owing to their smaller size.

There exist a few chips, however -- mostly ones commissioned for military or space applications -- whose component devices are already so small that they are vulnerable to degradation not only by routine operation but also by the simple tests meant to establish that they work. Bridgwood's studies are focusing on devices such as these.

"Under an electron microscope, [such] a chip hit with about 80 volts looks as if it's had a bomb dropped on it," Bridgewood says. However, he points out, roughly 95 percent of such cratering holes that he's inducing do not cause a permanent electrical short circuit. So chips with this type of damage would pass all electrical tests designed to measure reliability, he says.

That means a device shipped to unsuspecting users as a tested, fully functional and healthy chip could in fact be damaged. And the severity of the damage would likely lead to early chip failure, he says. How the cratering damage causes such latent, premature failure is still an issue of some conjecture, Bridgwood notes. One hypothesis he's studying is that metallic shrapnel from the cratering event migh -- under the influence of electric currents and the heating they cause -- migrate through the newly created flaw to cause a short circuit.

Bridgwood's work has demonstrated that electrical testing of circuits, a standard reliability test, can itself cause damage that reduces chip life -- if the chip uses think oxide films as insulators between conducting components. Ironically, this test, which intentionally passes a short electrical pulse through chip circuits to simulate normal operational use, is intended only to detect whether newly fabricated chips are functional.

Here the problem is one of "charge trapping," Bridgwood explains. Oxide films on the order of 100 to 200 angstroms thick are not foolproof insulators, he points out; there is always a probability that they will allow some current to "tunnel" through them. If it does, some of the associated charge can become trapped in the oxide, changing its characteristics, Bridgewood says, "and eating into the lifetime of the device."

The oxide films on many of today's Erasable Programmable Read-Only Memories (EPROMs, which are frequently used in industrial computer-controlled systems), are thin enough to make these chips susceptible to charge trapping. While this charge trapping will also occur during normal operation of susceptible devices, Bridgwood points out that "you can get significant wearout due [just] to the initial testing." And so other devices are made increasingly smaller, the thickness of their insulators will also decrease--moving them into the EPROMs' regime. Bridgwood says that means smaller, more densely packed "next generation" chips will be increasingly vulnerable to this invisible and life-threatening damage.

The picture that's emerging, says Billy Livesay, a chip-reliability engineer at Georgia Institute of Technology in Atlanta, is that when chips get small enough, any useful test may itself prove destructive to healthy chips.

It's all a factor of scale, he says; with micron-scale VLSI devices, some of the individual components packaged into a chip are only 20 to 100 atoms thick. And on this scale the tensile and electrical integrity of materials can degrade more readily, because the natural motion of atoms exerts a proportionately greater effect.

Though present-generation devices -- often designated LSI, for Large Scale Integration -- are generally less vulnerable to electrostatic-discharge damage, they are far from immune. Manufacturers have been struggling for years to build protective circuits into every chip they can. These circuits are designed to shunt electrical impulses--such as those associated with static electricity--away from vulnerable components. "But I personally have seen protective circuits fail," says Art C. Trigonis, as NASA's Jet Propulsion Laboratory in Pasadena, Calif.

Moreover, he points out, "When you introduce a protective circuit, you introduce capacitances [the ability of materials to store charge]--and they won't allow the device to operate up to the speed for which it was designed," he says. As a result, many chip designers sacrifice the potential benefit of a protection circuit against the added speed or space it frees up.

At a recent symposium, Trigonis reported on a number of electrostatic discharge failures among a type of LSI transistor that had been installed in equipment being readied for use abroad the Infrared Astronomy Satellite (IRAS) spacecraft. "These devices were already installed in the hardware," he says "and in the process of preliminary electrical testing and handling they were failing." His investigation eventually showed that the equipment designers had inadvertently chosen a particularly sensitive device--one that was 704 times more sensitive to electrostatic discharge than a different but functionally equivalent transistor that ultimately replaced it.

Had this problem developed somewhere else, it might not have been diagnosed as such. Trigonis's lab is one of only a handlful worlwide that's equipped not only to identify the failure site on a damaged chip--even if it's internal--but also to identify whether the failure was caused by electrostatic discharge or a manufacturing defect.

Eighty percent of Trigonis' analyses are conducted on integrated-circuit chips or discrete electronic components bound for NASA projects; the rest are for devices used in programs funded by the Departments of Energy and Defense. Admittedly, Trigonis says, "Most of the devices are state-of-the-art," meaning they are among the smallest, fastest or most densely packed--and therefore the most vulnerable. "But what we're finding," he told SCIENCE NEWS, "is that electrostatic discharge is getting to be a major problem as far as [device] failures are concerned."

To some extent, the problem can be minimized by ensuring that chip fabrication, packaging and testing take place under ultra-clean, static-free conditions. However, most manufacturers and chip examiners believe they are doing that already--and yet electrostatic-discharge damage still occurs.

Sometimes it results from accidents: Even the most skilled technicians occasionally pick up a big charge by unconsciously carrying plastic into an antistatic workstation. Then there's the issue of semiconductor purity. Any incidental flaw can render a relatively tolerant chip more sensitive. Finally, as manufacturers try to limit static buildup and contamination in chip handling through total automation, they are finding that even equipment can acquire charge buildups that eventually result in discharge damage to chips.

The shape and energy distribution of the pulse discharged by equipment is different and its delivery quicker than those given off by charged humans, Bridgwood's work shows. As he focuses on modeling this "charged-device" situation, Bridgwood is attempting to characterize how a pulse gets distributed through a circuit, in this case, a capacitor -- how quickly it takes damage to occur, where the damage site is most likely to develop and what factors, such as shape or materials composition, will slow an entering pulse.

"Our hope is that by redesigning the shapes of components and devices in integrated circuits," Bridgwood says, "you can spread the energy concentration--to limit or eliminate damage."

One of the techniques NASA has used to get around the problem of potentially unreliable chips is to install multiple, identical devices in critical systems. Then if one or two fail, there will still be several operating backups. Where these devices are involved in decision-making--such as deciding if it's hot enough to turn on a coller -- the system takes a poll of the installed devices, hoping that this democracy will lead to accuracy when there is a diversity of opinion.

The Defense Department has begun inhouse retesting of chips where reliability is especially critical, Livesay says. They do it to verify their suppliers' own reliability-testing programs. "It's a shame that they have to do this rescreening," he says, because it undoubtedly further degrades the useful lifetime of the surviving parts. "But what it buys in increased reliability may well be worth it."

The farther a weak chip gets embedded into a large system, the more costly it becomes to replace. It may cost several hundred dollars to diagnose and replace a failed 50^ part, once it has been soldered into a printed-circuit board. Livesay estimates, based on military and industry data, that costs can escalate to $10,000 if the damaged part isn't caught until after it's been installed in an airplane or missile.

The big problem, Livesay and others agree, is how to ensure that a functional chip is not harboring a life-threatening flaw. Many of the flaws are too small to view with regular microscopes. And thoroughly scanning a complex chip with an electron microscope could take several days, Livesay says. What's more, he notes, electron bombardment of small-scale chips during scanning-electron micrography can itself damage some of their more sensitive devices.

In the old days, engineers got around the problem by making all of their components--including insulators--big. But the drive for speedier computing has led to a reduction in the size of chips and their component devices, pushing them into increasingly vulnerable regimes.

Explains Bridgwood, "Ten years ago, we never described a device as wearing out. Once you got it going, you assumed it would keep going--virtually forever, as far as anyone was interested." But as devices get smaller, their wearout time gets shorter--and therefore more obvious. In fact, a consensus seems to be growing that this mushrooming reliability problem will turn out to be--far more than any physical manufacturing constraints -- the dominent factor limiting the smallnes of chip components.

COPYRIGHT 1985 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.