The big data market is in a state of upheaval as companies begin shifting their data strategies from “nothing” or “everything” in the cloud to a strategic mix, squeezing out middle-market players and changing what gets shared, how that data is used, and how best to secure it.

This has broad implications for the whole semiconductor supply chain, because in many cases it paves the way for more data to move freely between different vendors, no matter where they sit in that chain. That can go a long way toward improving the quality of chips and systems, reducing the cost of design and manufacturing, and shed light on supply chain constraints. It also opens up many more opportunities for data analysis to help offset rising concerns about liability in markets such as automotive, medical and mil/aero.

“For years, the Fortune 500 to the Global 5,000 were reticent about moving to the cloud, but all of a sudden in the last 12 to 18 months there has been a massive shift to the cloud,” said Michael Schuldenfrei, corporate technology fellow at Optimal Plus. “This is where the industry is going. Even for sensitive applications, such as test data, the whole thing is running in the cloud. There are more and more organizations placing bets on the cloud, and you’re even seeing this for the first time in semiconductor manufacturing.”

This shift is creating a fair amount of disruption along the way. Cloudera signaled big changes were afoot earlier this year when it merged with Hortonworks, reducing the Big Three in the Hadoop distributed storage and processing market down to two big players. Since then, Cloudera’s CEO stepped down in the wake of a fiscal Q1 (ended April 30) loss of $103.8 million, and MapR—the other remaining giant—sent a WARN Act notice to the state of California about layoffs. All of these moves have been generating buzz about what’s changing in the big data world, and ultimately how that will impact the cloud, the edge, and the semiconductor market that fuels all of this data processing and storage.

“Hadoop is in an extremely difficult situation because it is being replaced by cloud technologies,” said Schuldenfrei. “The edge is supplemental to the cloud. The cloud players will continue to grow. But if you got back three to five years ago and ask someone if they were going to deploy any solution in the cloud, they would have said, ‘No way. We don’t trust the security.'”

That was then. Companies have since begun rethinking what to keep local, what to ship to the cloud, and how data analysis can benefit their business.

“The big bang involves analytics from lots of sources,” said John Kibarian, CEO of PDF Solutions. “The reality is that it’s more economical than not doing it.”

Improving yield

This is particularly evident when it comes to yield, and that makes it a relatively easy sell to chipmakers because there is a strong value proposition for investing in analysis as it affects yield. This should be a slam-dunk at the most advanced nodes. The irony is that many of the companies working in this space are using a succession of test chips to learn what they’re doing wrong rather than relying heavily on data.

An iterative approach certainly works, but it’s more expensive and time consuming. Nevertheless, it may be the most complete approach at the most advanced nodes because that manufacturing process data is considered proprietary. Foundries share more data than in the past, but they still keep some data under wraps for a couple of reasons. First, it’s considered highly competitive. And second, that data is something of a work in progress because it needs to be refined as the process matures and as new processes are developed from that data.

“The amount of data available varies between different customers and applications,” said Ram Peltinov, Patterning Control Division head at Applied Materials. “Some are more advanced in making sense of the data, but most customers try to keep that proprietary. In R&D, there is a lot of information. At production, you usually end up with the ones that make sense.”

Data also varies greatly from one fab to the next, and from one process and sub-process to the next. To make sense of that requires deep knowledge of what to look for in that data, and that requires domain expertise.

“It varies from fab to fab because they deploy different technology,” said Subodh Kulkarni, CEO of CyberOptics. “That makes it difficult to correlate with other fabs. It’s not apples-to-apples, so it depends on which part of the data you’re looking at. At the sensor level, that requires a tremendous amount of domain expertise to turn raw data into useful data.”

This is harder than it might first appear on multiple levels. For one thing, the sensors themselves need to be calibrated. “Whenever you sense change, that has to be calibrated across a spectrum of materials,” said Kulkarni. “With scatterometry, you’re looking at reflected light and inferring information about the layer itself. We are usually reflecting distortion, which could be subtle surface structures and different silicon structures. And all of this has to be done differently, because silicon tends to oxidize almost immediately.”

Also, there is so much data and different types of data, that trying to make sense of it remotely doesn’t work. Some of this is used by the manufacturing equipment rather than by people trying to analyze it.

“We’ve always had error bars for probability of confidence to any measurement that’s made on the wafer,” said Chet Lenox, director of process control solutions at KLA. “But on the analytics side, the way we analyze the raw data coming off a metrology tool is changing dramatically with all of the machine learning algorithms that are available, as well as the increased computing power that we have available on the tools. We’re a little bit different from a Facebook or Google with massive data center-based analysis. We need a number, whether it’s metrology or inspection, coming off the tool right now. Any analytics done on the sensor data has to be done basically in real-time. Otherwise, there’s just not a lot of value. We’re still seeing a revolution in the way that data is processed in order to give you a final critical dimension or overlay offsets or film thickness, or whatever that may be. That also has to be real-time. You need the data now, as opposed to running wafers to the end of line and seeing what

happens months from now.”

How this filters down to chipmakers is inconsistent. This is partly due to the fact that some of the companies working at the leading edge are relative newcomers to making chips. It’s also partly because the cost of respins can be spread across an entire system by a large systems company. Numerous industry sources report an increase in respins over the past couple of years.

“Engineering solutions are coming to application knowledge, not to physics knowledge,” said Jack Harding, president and CEO of eSilicon. “That, to me, is very dangerous because you can’t predict the behaviors in the field under extraordinary circumstances. So when people say, ‘We’ve got it working,’ the first thing I want to know is whether it’s working because they did trial-and-error and it didn’t blow up, or because they’ve done the math and the simulation and they’ve convinced themselves they have a solution to a problem they haven’t identified.”

It’s the more established nodes where this data is starting to be analyzed on a much broader scale.

“You can use it to make sure a part is working, but you also can use it understand which dies pass and which dies fail,” said André van de Geijn, business development manager at yieldHUB. “If you have low yield, you want to take a deeper dive into a system and understand which test is failing and why.”

That includes automotive chips for everything except AI for assisted and autonomous driving, which are being developed at 7nm today. But even there, data is being shared on a more limited basis. The joint working agreements being hammered out by Ford and Volkswagen, and by BMW and Mercedes, seek to combine efforts in R&D in order to speed time to market for self-driving vehicles.

Improving reliability

All of this has a big impact on reliability. While German automakers are demanding zero defects for 18 years, there is also a push to make chips in other devices more reliable. The current standard in smart phones is now four years rather than two, and in some industrial applications it has risen as high as 20 years.

“The analytics are growing massively because all of the chip and system companies are managing a much more complex supply chain,” said PDF’s Kibarian. “Traceability is a requirement. People want to know what set of tools that was assembled on and what is the exposure for other people who may have seen a similar problem…When you have an autonomous car that crashes, everyone wants to know if they have exposure.”

The key to avoiding problems is to analyze all available data, particularly in critical areas.

“This is like using Google Earth,” said Max Odendahl, CEO of Silexica. “It’s not good enough if you see a hotspot and you don’t know what’s going on. You need to tie this back to the original source code. We could run the same model twice, compare it to two different iterations, but one time something ran in 2 milliseconds and another time it ran in 25 milliseconds. Then you need to figure out what went wrong. So we need to do static analysis, dynamic analysis and semantic analysis to really understand what is the root cause and where did it come from, and how did it affect the system.”

That also requires gathering of much more data than in the past and analyzing it with some insight about what can go wrong and what really matters if it does go wrong.

“So now you have sub-object analysis,” Odendahl said. “You need to understand how area is being accessed. That will make a huge difference on how you’re going to stream data your DSP or GPU. You might be able to see there’s a bottleneck in the computation, but you may be blind to the reason. So there’s high-level stuff, where you’re synchronizing more than you’re sending data or you’re sending too much data to another system, versus you’re filling your tasks and sending too little data. There are various different layers, and you need to figure out where is the bottleneck. If it’s not in a critical path, you may not care, providing it’s an optimized loop. But to find that out, you need that system view.”

And the data required to create that system view now needs to stick around significantly longer than in the past.

“There needs to be continuity in that data,” said John O’Donnell, yieldHUB’s CEO. “New customers want 15 years of data storage, which is a big change from the previous one or two years of data currently being stored. You’ll see this with machine learning and AI next. But the challenge here is that some data can be very messy. A lot of chips are tested and retested and re-screened, so you need to reassemble data according to the picture of each batch being manufactured.”

Improving security

On the positive side, more data opens up more options to use it in different ways. One of the newer approaches is to use data analytics for real-time security by monitoring data traffic on a chip.

“You can export that data, and obviously more data requires faster pipes,” said Rupert Baines, CEO of UltraSoC. “So we’ve got PCIe, CCIX, high-speed Ethernet that we can use to take data off chip. Increasingly, we’re also using an approach that does not take it off-chip. If you have lots and lots of processors on a chip, why not use those processors to do the analytics on-chip and locally? So you route the analytics to a subsystem, and rather than using an expensive I/O, you run the analytics locally. The advantage of that is you can be doing it in-operation. You’re sweating your assets. We have customers laying out their chips, and they’re putting in processors purely for this task. They’re using it for safety and security applications, so the analytics are being used to identify failures, potential hacks and malware, and they’re doing that live and dynamically within the chip and observing traffic patterns as they flow past. They can then react incredibly quickly because they’re within the same chip. You

’re not sending traffic somewhere else.”

That needs to be combined with a better understanding of the data being generated within a device.

“Having local smarts and local filtering to dramatically reduce the data volumes. If you were just to do dumb sampling of signals from a 2GHz clock with a 64-bit bus, you’re up at 100 gigabits per second on one single trace,” Baines said. “You’re talking terabytes or petabytes very quickly. So it’s absolutely essential to have intelligent local filtering in order to turn screeds of data into high-value, intelligent signals. You need that local, on-chip, statistics gathering. That’s averages, peak, mean, best-case, worst-case. You need anomaly detection. You don’t have to filter that locally, but you do have to be able to do the abstraction.”

Conclusion

There are many pieces in the data flow. Some of that data will be processed in the cloud, some will be processed at the edge, and some will be processed directly on a chip or a specific piece of equipment. The challenge, going forward, will be to make sense of all of that, which may be company- or even business-unit specific.

The data market is still evolving, and the analytics based on that data are still being defined. But all of this will have a profound impact on the semiconductor industry. The market gyrations today are just the beginning of what will eventually define the edge, the cloud, and the quality, performance, price and ultimately the interactions of all of the components that drive those systems.