Machine Learning Meets IC Design

Machine Learning (ML) is one of the hot buzzwords these days, but even though EDA deals with big-data types of issues it has not made much progress incorporating ML techniques into EDA tools.

Many EDA problems and solutions are statistical in nature, which would suggest a natural fit. So why is it so slow to adopt machine learning technology, while other technology areas such as vision recognition and search have embraced it so easily?

“You can smell a machine learning problem,” said Jeff Dyck, vice president of technical operation for Solido Design Automation. “We have a ton of data, but which methods can we apply to solve the problems? That is the hard part. You cannot open a text book or take a course and apply those methods to solve all problems. Engineering problems require a different angle.”

Before diving too deeply into the places where ML is being adopted, let’s examine some of the issues.

Setting the stage
It helps to start with a classification of techniques. “In the broadest sense, we have rule-based techniques that we are all used to within EDA,” explained Ting Ku, senior director of engineering for NVIDIA. “Within that there is machine learning, and within that is deep learning. Rule-based is deterministic. There is no database involved, and there are defined features. For machine learning, the starting point is statistical and not deterministic, and it does involve a database because you have to learn from experience. With machine learning we may still have pre-defined features, and that is the differentiation between machine learning and deep learning. Everything is the same for deep learning except there are no pre-defined features. So the natural question would be, ‘What are features?'”

Figure 1: Taxonomy of techniques/Semiconductor Engineering

Once you have the features and enough stored data, you have to do something with it. “It is impractical to search the entire design space,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “It is also impractical to devise a polynomial time algorithm due to the highly nonlinear nature of the space. For such problems, machine learning—where past experiences in solving similar problems are used in the form of training data to learn and predict solutions for new similar problems—has shown tremendous promise.”

There are several ways in which the learning process can happen. These are generally called supervised, unsupervised and reinforcement learning. Most EDA applications are looking at supervised learning. “There are two types of supervised learning,” explained Eric Hall, CTO for E3 Data Science. “Regression is where we want to predict a numerical value and classification is where we want to predict one of a few outcomes. There are several machine learning algorithms that can solve these problems, but there is no magic bullet or free lunch.”

There are also other problems. “Deep learning techniques are excellent at finding those undiscovered features to model non-linearity, but it is a black-box, hard to interpret and can take a long time to train,” Hall added.

Training
Machine learning techniques are only as good as the data they are trained with. “ML is an iterative process,” said Ku. “You have decision algorithms, and they will create suggestions. The answer may not be correct, so you have to verify that. Once that is done, the data is included back into the database. That is when retraining takes place. The cycle continues. At some point, the hope is that all of the iterative cycles will make the model pretty accurate so that when a new case is seen, the prediction will be good.”

In many cases, data may be available from previous designs, but is that enough? “Imagine 2000 SPICE simulators working in parallel to solve a problem for a chip we have never seen before on a manufacturing process we have never seen before,” said Solido’s Dyck. “We can gather some information about how things behaved in the past and use that to shape the models, but there is also real-time data. This is real-time machine learning and building models in real time.”

And real-time learning creates a host of other problems. “If something goes wrong in the streaming data, or you get incorrect answers that pollute the models, you need to filter or adapt it—and that is really hard,” he added. “We need automated recovery and repair. When something goes wrong you have to be able to debug the streaming data.”

But debugging an ML system is relatively uncharted territory. Few, if any, verification techniques are known.

There are other types of learning associated with an EDA flow, as well. “We need to be able capture knowledge through the design implementation process,” said Sorin Dobre, senior director of technology at Qualcomm. “EDA has a great opportunity to expand the use of supervised and unsupervised machine learning solution for design flow optimization. We have senior engineers with 20 years of experience who can guarantee good quality designs, but we need to help designers who are starting from scratch. We cannot wait five years to bring them up to full productivity.”

The job is getting harder for experienced designers, as well. “In the past, architects have designed interconnects relying on their experience and making key design decisions such as choice of topology and routing based on gut feeling,” said NetSpeed’s Mohandass. “However, this approach does not scale for heterogeneous systems where on-chip requirements are extremely diverse. Due to the complexity of interactions among various on-chip components, it is practically impossible to design a near-optimal and yet functionally and performance correct interconnect that considers all use cases.”

The dataset
Those use cases are part of the dataset, which isn’t always clear at the outset.

“Getting a good dataset can be a challenge,” said Harnhua Ng, CEO for Plunify. “The tools’ learning capabilities ensure that the more an engineering group uses them, the smarter the learning database becomes, accelerating the time to design closure.”

So can the techniques only be used by those with large, existing datasets or can EDA provide the initial training? “For many machine learning applications in EDA, algorithm-related parameter selection and training needs to occur completely within a design customer or foundry’s computing environment,” said David White, distinguished engineer at Cadence. “In these applications, the most challenging task is the creation of automated training and verification methodologies that can ensure the algorithms operate as expected for the targeted silicon technology. In some cases, the more advanced and sophisticated machine learning methods offer great accuracy but can be the most difficult to support in the field. During development one needs to weigh the tradeoffs in the selection of the correct algorithm and architecture, given the required accuracy, as well as the quantity of training data available and other support and use model related constraints.”

Mohandass provided one example of the dataset required for interconnect design. “The perfect interconnect strategy depends upon a very large number of SoC parameters, including floorplan, routing constraints, resources available, connectivity requirements, protocol level dependency, clock characteristics, process characteristics such as wire delay, power budgets, bandwidth and latency constraints, etc. The number of unique dimensions in the design strategies space grows to several hundreds, creating an excessively large design space.”

Infrastructure
There are several dimensions to this problem. “Machine learning can be implemented in EDA,” said Hasmukh Ranjan, corporate vice president and CIO of Synopsys. “But for maximum benefit, ML should be implemented both inside the tools themselves as well as around the tools, in flows.”

Qualcomm’s Dobre agreed: “It is not necessary for everything to be implemented in the EDA tools. You can have independent machine learning solutions that can drive the existing tools.”

Shiv Sikand, executive vice president at IC Manage, provided one example. “By analyzing billions of data points from previous tapeouts we can predict the impact of bugs, design complexity, human resources, licenses, and compute farm throughput on current projects. By identifying bottlenecks in semiconductor designs we can provide forward prediction and identify potential delays.”

The infrastructure on which we run tools may also need to be examined. “We also need to consider intelligent storage,” Sikand added. “By analyzing the data streams associated with file operations, machine learning techniques such as clustering and regression analysis allows continued improvements in the P2P networking and cache management to deliver even higher application performance.”

Dobre’s team is familiar with these problems, as well. “We have farms with tens of thousands of CPUs. When you look at the number of designs that have to be validated at the same time, how do you use those resources in an optimal fashion without having an explosion in resource requirements? Then there is data management. How do you deal with this much data in the design space and on the foundry side in an efficient manner and extract knowledge and information needed for the next design to reduce the learning cycle?”

The machine that will be running the ML algorithms adds yet another dimension. “Machine learning is going to reduce the time of design and simulation through existing complex algorithms,” said Sachin Garg, associate director for Markets and Markets. “EDA tools can suggest or take intelligent decisions to move this along further, but we require better hardware (CPU+GPU) to run such complex ML algorithms to make it more effective. Current GPUs offer enormous acceleration for parallel computing workloads and superior performance scaling generation-to-generation.”

Cadence’s White concurred: “Advances in massively parallel computing architectures open the door for what-if based optimization and verification to efficiently explore the design space and converge on the most promising decisions.”

Application areas
Success relies on being able to define the right set of features. “Consider variation design,” said Ku. “If you want to model a probability density function, you need attributes. Features are attributes that differentiate between one thing and another. For people it could be hair color, height, gender – those are features. For variation it could be PVT corners, the algorithm necessary to define the device variation, and the random variables of the devices. So features are the things that are important for a particular problem.”

At 10nm and 7nm we see a lot of process variation. “The foundry effort to bring up a new process technology is significant,” said Dobre. “It is required to consider library elements to be analog design even though it is in the digital space. You have to validate the design across multiple process corners which. How do you get high quality without the resources needed exploding? Machine learning can bring improvements in productivity by 10X, reduction in characterization time by weeks, and the reduction in the number of resources. Machine learning is an effective method to identify patterns which are driving yield failures. We are seeing significant potential here with an economic benefit.”

EDA is stepping up to this problem. “For advanced-node design, there is an increase in uncertainty presented by new silicon technology and additional verification needs, and thus an increase in potential risks,” White said. “In conventional design flows, prior design and layout data is not leveraged efficiently to help guide the next design. Advances in analytics allow prior design data and trends to be examined (mined) and used to guide design decisions at the earliest stages of the design flow. These same methods can be used to discover and provide context that drives the training and development of machine learning engines. It is likely that such solutions will leverage large volumes of data and require hundreds of machine learning components that will need to be managed and verified. Once the data is properly contextualized, machine learning can be used to capture complex behavior providing analysis (e.g. parasitic, electrical, verification) with high accuracy AND fast performance.”

There are also areas of design where it can help. “We can use it for memory or logic gate power estimation or timing estimation,” said Hall. “This will reduce uncertainty and the padding that people apply and thus create a more competitive product.”

Another area where solutions are appearing is in routing. “In the context of interconnect designs, the first step is to identify the combination of design strategies in each dimension that leads to good solutions for a large variety of previous SoC designs,” Mohandass pointed out. “The next step is to use that information to learn patterns and predict which combinations of strategies will most likely lead to good designs.”

Similar techniques also apply to FPGA routing. “Complex FPGA designs with tricky timing and performance closure issues are excellent candidates for tools based on machine learning techniques,” added Plunify’s Ng. “Machine learning tools are able to analyze past compilation results to predict optimal synthesis/place-and-route parameters and placement locations out of quadrillions of possible solutions. They infer what tool parameters are best for a design using statistical modeling and machine learning to draw insights from the data to improve quality of results.”

Trusted results
But designs face a higher hurdle than many other applications of ML. “If at the end of the day there is risk of a respin or over-margining, people won’t adopt the solution,” explained Dyck. “Machine learning tools are big estimators. You can’t just ask them to trust it. So we need accuracy-aware modeling techniques. There are very few of these today – you have to invent them. We need active learning approaches that can incrementally find areas of interest and these are often around the worst cases. Show me where my chip is likely to fail and get lots of resolution in that area. So you want to direct experiments into those areas. Targeting problem areas is important.”

Dyck also pointed out another hurdle that EDA faces. “If you can’t prove that an answer is right, they will not accept it. So you need to design algorithms that are verifiable. You need to implement verification as part of the technology so that when you give an answer you show that it is correct at run time.”

Conclusion
ML has started to penetrate EDA and design flows. “Machine Learning has already started to play a major role in EDA,” observed Gupta. “It has further opportunities to provide disruptive technology breakthroughs to address semiconductor challenges.”

There is a long way to go, however. “Today, we have only seen the tip of the iceberg,” said Ku. “What needs to happen is that EDA needs to stop providing data. Data is nice, but what we really want is decisions. All you need to do is put a layer in between the data and the decision and the machine algorithm can learn about what the decision should be by learning from the data. EDA is in a perfect position to do this work.”

Small steps may be required if trust is to be maintained. “Artificial intelligence and machine learning can be what sets a company apart from its peers, but it also needs to be leveraged without compromising accuracy,” concluded Synopsys’ Ranjan.