Finding Faulty Auto Chips

The next wave of automotive chips for assisted and autonomous driving is fueling the development of new approaches in a critical field called outlier detection.

KLA-Tencor, Optimal+, as well as Mentor, a Siemens Business, and others are entering or expanding their efforts in the outlier detection market or related fields. Used in various industries for several years, outlier detection is one of many technologies used to achieve a major goal in the automotive arena-zero-defect quality in chip production. Generally, outlier detection itself uses hardware and statistical screening algorithms to locate so-called outliers. In simple terms, outliers are chips that may have passed various tests, but they also demonstrate abnormal characteristics. These types of chips may impact the performance of a system or could fail.

Outliers or faulty chips arise for several reasons, including the advent of latent reliability defects. These defects do not appear when a device is shipped, but they are somehow activated in the field and could end up in a system.

To help catch these and other problems in chips, the industry uses various outlier detection methods, such as part average testing (PAT). In PAT, a wafer undergoes an electrical test. Then, using a combination of hardware and PAT algorithms, the outliers or faulty dies that reside outside a certain test spec are detected. The outliers are then removed.

Fig. 1: Graphical representation of part average test limits and outliers. Source: Automotive Electronics Council

In its most basic form, though, PAT falls short for the demanding requirements in automotive. “There is an exponential growth in semis in automotive, in general, and in other types of mission-critical devices, as well,” said Michael Schuldenfrei, CTO at Optimal+. “That’s pushing up the requirements for quality and reliability. Outlier detection as a topic, such as PAT or part average testing and all of those kinds of techniques, has been around for dozens of years. But in many cases, they are not very effective or they come at a very high cost in terms of preventing escapes.”

Escapes refer to faulty parts that leave the fab or test floor. So over the years, outlier detection specialists have developed new and more advanced techniques to prevent escapes and other issues for chips. For example, outlier detection is usually conducted in test. But in a new twist, KLA-Tencor has developed a technology for use in the fab.

Still, the industry faces some major challenges. Here are a few of them:

More leading-edge chips are being used in cars, which require new and advanced outlier detection algorithms.

Outlier detection technology will need to keep up with the trends in assisted and autonomous driving.

Nvidia and other IC makers are entering the automotive market, meaning vendors with no experience in outlier detection will need to move up the learning curve.

There are other challenges in the growing segment. Besides automotive, outlier detection is also used in the medical and other fields. In total, the size of the commercial outlier software business ranges from $25 million to $50 million per year, according to Mentor, a Siemens Business. “This likely represents only one-third of the actual software in use, since many large IDMs have built their own proprietary tools,” said Bertrand Renaud, general manager of the Quantix group at Mentor. KLA-Tencor, Mentor, Optimal+, yieldWerx and others compete in the arena.

Auto chip trends
In 2018, the automotive market may experience a slowdown. Light vehicle sales are estimated to reach 95.9 million units in 2018, up 1.5% over 2017, according to IHS Markit. That compares to 2.4% growth in 2017, according to the firm.

How that applies to semiconductors isn’t entirely clear. The automotive chip business represents only about 10% of the overall semiconductor market. But that doesn’t tell the whole story, as the electronic content continues to increase per car from $312 in 2013 to $460 by 2022, a 7.1% growth rate, according to IHS Markit.

“Up from a few hundred large design rule controllers and other components a decade ago, a modern vehicle may now contain more than 3,500 semiconductors that represent a continuously rising percentage of the overall costs,” Rob Cappel, senior director of marketing at KLA-Tencor, said in a blog.

An advanced car has more than 7,000 chips. In advanced vehicles, OEMs also are incorporating 14nm and 10nm devices, with 7nm in R&D.

In automotive, though, there are at least two constants-reliability and quality. For commercial chips, there is a certain tolerance for defects. In automotive chips there is no tolerance for defects or failures.

This is not a new phenomenon. “In one historic example, look at anti-lock brakes,” said Ben Rathsack, senior member of the technical staff at TEL. “The reliability requirements for automotive are always higher because of the safety elements.”

So, automotive device makers and foundries must adhere to various quality standards, such as the AEC-Q100. This standard involves the failure mechanism stress test for chips.

Reliability becomes even more important for advanced driver assistance systems (ADAS) and self-driving cars. ADAS involves various safety features in a car, such as automatic emergency braking, lane detection and rear object warning.

For example, NXP recently announced a high-resolution radar chip for use in automotive applications. The chip, dubbed the MR3003 Radar Transceiver, is a 77GHz radar device. Based on a silicon germanium (SiGe) process, the device is developed for front or corner radar applications in automated driving, where high resolution and long-range capabilities are needed.

Capable of tracking thousands of targets simultaneously, this radar technology enables real-time sensing of the surrounding environment, essential for L4/L5 autonomous driving. “These types of applications are demanding on us and the chips themselves. We take great care to design upfront the safety protocols and hooks that are inside the systems to allow the sensor and car to be able to self-diagnosis in certain situations,” said Patrick Morgan, vice president and general manager of the ADAS modem product lines at NXP, the world’s largest automotive chipmaker, in a recent interview. “When we ship the chips, we take a lot of effort to guarantee every spec. It’s absolutely a zero-tolerance type of mindset. Safety doesn’t tolerate mistakes.”

Kamal Khouri, vice president and general manager for ADAS technology at NXP, added: “Everything that we do here has to meet very, very strict automotive safety and reliability standards. There is a lot of work that goes to make sure everything that we introduce is safe and reliable.”

This is critical. For example, Audi has some 7,000 chips in a premium car, according to data from Optimal+. Hypothetically, if Audi encounters a 1 part per million (ppm) failure rate per device, the carmaker would have 7 failures for every 1,000 cars, according to the firm. Then, if Audi makes 4,000 cars per day, it would encounter 1 failure every hour.

So the automotive industry is striving for zero-defect and other quality programs, but it’s difficult to achieve these goals because systems, chips and even software are becoming more complex.

In its new vehicle dependability study, J.D. Power measured the number of problems experienced per 100 vehicles by owners of 2015 model-year vehicles during the past 12 months. Overall vehicle dependability improved 9% in 2017, but there are issues with various electronic systems. For example, the audio/communications/entertainment/navigation systems remain a troublesome category for owners, which received the highest number of complaints, according the survey. Built-in voice recognition and Bluetooth connectivity were the biggest problems.

Those and other issues may linger with the newer models. That’s why outlier detection is critical. In outlier detection, the process starts with an electrical test after a wafer is processed in a fab. Then the wafer is sent to the test group for evaluation.

This still only addresses some of the potential problems. “It’s impossible to test every pathway. So the coverage may be incomplete. Even then, you are running so many different tests. Sometimes, the answer you get is ambiguous. But we know the current methods aren’t good enough,” said Jay Rathert, senior director of strategic collaboration at KLA-Tencor.

In addition, test may or may not find the dreaded latent reliability defects. “A latent reliability defect is one that leaves the fab, or escapes the fab, but it somehow gets activated in the field through the environment, whether its vibration, humidity, current, electromigration or heat. This could become a short over time,” Rathert said.

Fig. 2: Random defects. Source: KLA-Tencor

Regardless, there is a new idea-Why not catch the problems before they leave the fab?

In the fab
According to UC Berkeley, a theoretical fab with 50,000 wafer starts per month requires the following equipment:

50 scanners/steppers plus wafer tracks

10 high-current and 8 medium-current ion implanters

40 etch machines

30 CVD tools

In addition, 300mm fabs are also automated plants that make use of an assortment of automated material handling systems and wafer transport mechanisms. A wafer is processed in a fab in a step-by-step flow using various equipment. An advanced logic process could have from 600 to 1,000 steps or more, while a mature technology has less.

In advanced nodes, the equipment must process smaller and more exact features. And at each node, the defects are becoming smaller and harder to find.

Each application has a different set of defect requirements. Generally, consumer OEMs have less stringent defect control requirements. In automotive, though, chipmakers must implement tighter controls in the fab processes and employ continuous defect improvement programs.

“There are certain prerequisites (in automotive),” said Wenchi Ting, associate vice president at UMC. “You must have a well-managed factory and well-maintained tools. On top of that, you need a robust quality system and quality mindset, which will enable you to receive those certifications required for making automotive products. This is quite complicated. In automotive, it starts with the design of the process and planning the factory. Then it extends to when you actually produce a chip.”

In the fab, inspection systems are used to locate defects on the wafer. Generally, chipmakers don’t necessarily inspect every wafer in the fab. It takes too long and is expensive. Instead, they may sample certain die or parts of the chip.

For consumer chips, the process is straightforward. “When we develop a technology, we qualify it,” Ting said. “Usually, the sample size is limited.”

Automotive requirements are different. “You have to run a tremendous amount of samples in order to see a failure rate. It’s a very costly process,” he said. “People are thinking very hard how to achieve that goal while containing their cost. There are a lot of challenges on every front.”

All of this amounts to time and money. Then, if the chip meets spec after the inspection and other processes, the wafer moves from the fab to test.

From there the burden shifts to the test group. Seeking to help the test group, KLA-Tencor has devised a technology to catch the problems in the fab. The technology, dubbed In-line Parts Average Testing (I-PAT), leverages the concepts of PAT. But unlike PAT and its variants, which are conducted in test, I-PAT is performed in the fab.

I-PAT doesn’t necessarily compete with traditional third-party outlier suppliers. The goal is to complement them by adding more data to the mix. Generally, you still need to perform traditional outlier detection.

The technology from KLA-Tencor involves both hardware and a data analysis package. In simple terms, inspection data is fed into a computer modeling program. It crunches the data and looks at dies on a wafer map, and then it looks for outlier defect populations across multiple inspection steps in the fab.

In one simple example, the technology will show a wafer map with five layers on a device, such as the active area, gate, contact, metal 1 and metal 2. Let’s say there are 800 defects on the metal 1 layer. Then, the computer randomly selects 10 of those die on the wafer. Then, using various I-PAT algorithms, the system then determines that 9 out of 10 are latent reliability defects.

This process is repeated several times. “You play the game over and over,” said David Price, senior director of marketing at KLA-Tencor. “By playing this over and over again, you can see how the statistical nature of the defectivity drives your ability to find the die that are most likely to contain the reliability defects.”

I-PAT could be used to cull problematic dies. In addition, the data could be combined with other outlier methods to improve the pass/fail decision for a die. “You will be able to reduce the amount of overkill and underkill that you are making in your traditional PAT method,” Price said.

Fab to test
From there, the wafer moves from the fab to the test flow, where it undergoes wafer sort, final test and sometimes a system-level test process.

Inspection and test provides a dizzying amount of data. But how do you know if a part still has a latent reliability defect or other issues?

That’s why automotive OEMs want their suppliers to perform traditional outlier detection during test. “PAT binning at wafer sort is done as an offline process on a server after the entire wafer has been tested,” Mentor’s Renaud said. “PAT binning at final test is performed in-line on the tester, after each part is tested, although a server manages the recipes and controls the calibration processes,”

Generally, outlier detection technology takes electrical data from the fab and then crunches the numbers. KLA-Tencor’s new technology would provide more data to the mix. “We are already able to collect inspection data from machines like KLA’s,” Optimal+’s Schuldenfrei said. “Combining that all together and using that is obviously going to increase the accuracy even more.”

PAT, the most basic form of outliner detection, is supposed to capture a die that falls outside a pass-fail test limit. The test limits can be set in either a static (SPAT) or dynamic (DPAT) mode.

In SPAT, the test limits are based on a set number of batches. Generally, in DPAT, the limit is calculated for each wafer test. In both SPAT and DPAT, an algorithm is performed. The device simply passes or fails.

In some cases, though, these algorithms may fall short. Some parts may reside away from the rest of distribution, but well within the limits of the spec. Other parts may be extreme outliers, which are well away from the normal distribution. “That can seriously skew the entire population. Then, you end up missing outliers that are closer to the center of the population results,” Optimal+’s Schuldenfrei said.

Outlier detection specialists have incorporated programs to solve these and other issues. But over the years, the devices have become more complex, thereby requiring more advanced outlier detection techniques. “Customers are demanding more and more sophisticated algorithms to identify real outliers without causing unnecessary yield loss,” Mentor’s Renaud said. “Sophisticated automatic shape detection is required to recognize non-Gaussian distributions.”

There are several types of complex outlier detection algorithms, based on geographical, multi-variate and other schemes. Many algorithms can even be combined, even with DPAT and SPAT.

One advanced type, geographical PAT (GPAT), looks at the quality of die based on its geographical proximity.

Fig. 5: Wafer map after outlier detection with GPAT. Source: Optimal+

Then, a complex version of this is called good-die/bad-neighborhood (GDBN). GDBN is based on the idea that defects tend to congregate in certain locations on a wafer. In simple terms, good die might be removed in areas with higher defects.

Another technology is called the nearest neighbor residual (NNR). “Nearest neighbor residual is looking at each value for each test in each die. It’s not just in the context of the wafer overall, but also in the context of its neighbors,” Optimal+’s Schuldenfrei said.

There are other methods as well, such as multi-variate techniques. “Geospatial algorithms examine failure patterns on the wafer for reticle defects and clusters of failing die. Meanwhile, multi-variate algorithms measure the correlation between multiple tests instead of considering just one test at a time,” Mentor’s Renaud said. “RMA analysis is also key to improving outlier detection by virtually re-binning field returns with different PAT algorithms to determine if the failed part could have been detected as an outlier earlier.”

What’s next?
Going forward, ADAS and autonomous driving will propel the need for more detection technology. “It will become significantly more of an issue as the car becomes more autonomous,” Optimal+’s Schuldenfrei said.

Additionally, artificial intelligence and machine learning will enter the mix. “With all of the new processing power and capabilities around machine learning and AI, we believe that those will become more involved in running outlier detection,” Schuldenfrei said.

Finally, bringing all of the data together is perhaps the biggest challenge. “Imagine taking data from a chip and correlating it to board data across multiple and different companies,” he said. “You will need to share data in order to achieve better outlier detection.”