Using More Verification Cores

Semiconductor Engineering sat down to talk about parallelization efforts within EDA with Andrea Casotto, chief scientist for Altair; Adam Sherer, product management group director in the System & Verification Group of Cadence; Harry Foster, chief scientist for Mentor, a Siemens Business; Vladislav Palfy, global manager for applications engineering at OneSpin; Vigyan Singhal, chief Oski for Oski Technology; and Bill Mullen, senior director for R&D at ANSYS. What follows are excerpts of that conversation. To view part one, click here. Part two is here.

SE: Parallelization became important when the benefits of processor scaling started decreasing. It took the software industry a long time to learn how to utilize multiple cores successfully. How is EDA doing in comparative terms?

Casotto: Those were exciting times, and we started talking about how the software guys now have to get their job done because it was no longer the hardware guys pushing the innovation. But it didn’t happen because it is hard. There is also a responsibility from the universities that they are perhaps not fulfilling. We should expect that any guy coming out of a university would be capable of programming in parallel. But my experience is that programming in parallel is hard. We have had recent experiences where a person doing it for the first time thinks the answer is to put locks everywhere. That is the opposite of running in parallel. You have to get rid of all of the locks and compose the solution in a way where every piece can run without any lock. That is partly education, but we find that talent is rare.

Foster: It is not an easy problem. Way back we were working on compilers that could parallelize code. The research has been going on, but it is a tough problem.

Mullen: One of hardest problems is that when something goes wrong, how do you debug it?

Singhal: You need to be able to reproduce it.

Mullen: Right, but not all programs are deterministic. They could depend on the number of processors being used.

Singhal: As EDA tool developers, when a problem happens in the field and you can’t reproduce it…

Foster: You also have issues with fabrics that interfere with the determinism.

Mullen: Then we have to deal with customer networks and IT departments that could be very sub-optimal. They may have very fast processors and a terrible network, or they are all trying to write terabytes of data to a file system that is choked. It is not a balanced system, and that can make it challenging.

Sherer: A common request that we get is to audit a regression environment. When they say that they mean everything, including the infrastructure and how they balance workloads.

Mullen: It is highly variable. During a tapeout cycle they are running different workloads than at other times. It is constantly changing.

Sherer: Recently we had a case where a customer could run a particular regression. If he started on a Friday it would be done on Saturday. If he started on Monday it would be done on Friday. I think there is an onus on the universities to do better training as well. Most of what they do involves single compute environments and linear applications. Learning with things such as FPGA would help because it forces you to decouple things to get speed.

Foster: At the same time, concurrency is inherently difficult for humans to reason about.

Singhal: There is no substitute for thinking and if you can divide up the problem and decompose it you are better off. Once you do it enough times you learn, but you have to start by thinking about it, be able to decompose problems, abstract problems.

Mullen: Another challenge is that we have to make use of IT environments that are not tuned for EDA. It used to be that you could get a large machine with lots of memory, and that was great for executing monolithic executables. But now it is commodity x86 boxes with very low memory, and some of the problems we deal with require a terabyte of memory.

Singhal: Should we go back to selling machines designed for EDA tools? It means that licenses are not being used effectively.

SE: Are we stuck in old ways of thinking? Are there things we could learn from other industries?

Mullen: We are running 20,000 cores already. What more do we want? It is true that HSPICE is not faster, but the fact that we are running 7,000 instances of HSPICE is amazing.

Sherer: The Arm-based servers today do have higher core counts and we are showing scalability. The Qualcomm machine is 48 cores. HiSilicon has 32-cores, and we show scalability. Yes, there is trail-off because of the underlying design, but we show scalability.

Casotto: At the application level or at the workload level?

Sherer: This was single application, single run.

Casotto: So a multi-threaded application.

Sherer: Yes. It is a technology in progress for sure. Making a 7X to 10X generic statement is tough to do. There will always be designs that do not get that. I do not want to set that expectation. There is an environmental pressure, but we have to switch from packet-based designs to more neural networks that are inherently parallel.

Palfy: To be fair to ourselves, I do think EDA is looking into other industries for ideas—machine learning and big data analysis and new heuristics. More than ever we are looking to other industries to figure out how to deal with these problems.

Mullen: You have to be open to new ideas. We are using elastic computing. We can use more as more become available, we can get better parallelism dynamics.

Sherer: I agree that we cannot discount the need for parallelism when you are up against signoff. If this is your second or third bump against the wall, you will consume all of the compute capacity that the world has to get it done in a day. The parallelism helps, even for fractional improvements. One of the challenges with SystemVerilog is that it doesn’t guarantee random stability when the design changes. So I can have 10,000 test that are all running happily, and someone makes a small change in the RTL and then the witch hunt starts as to why it wasn’t found earlier. The problem is that the state space is so large that even going from 10,000 tests to 30,000 tests won’t guarantee that you would find it.

SE: Looking at other industries, there are a lot of big data applications that appear similar to problems in EDA. If we could parallelize the data, then could we be more effective. Consider debug. We have huge datafiles that are organized serially, and this makes it very difficult to treat it as a big data problem.

Casotto: You think there are programs that can do debugging rather than having an engineer looking at what the chip should be doing?

Mullen: You can enable it. We have a data model where we can bring in data from FSDB, from layout, power integrity — from many different sources. And the user can write MapReduce functions that take advantage of hundres or thousands of cores and walk through the data and look at interactions between timing and layout. It is very powerful and extensible. Some users are scared to write in Python, so we have a way to help them with that. But it is there.

Foster: It is certainly very useful from a system perspective. We lacked insight into what is going on. What we have done, using data mining techniques, is to create low overhead probes that we can put in a fabric, and they start to mine data. You can see things that you could never see using traditional approaches, particularly in the area of performance, in the area of coherence — they just pop out and you can visualize it. This is new and emerging, and we are learning from other domains and applying those techniques to verification.

Casotto: That is visualizing your data.

Foster: It is more than visualization. It is data mining, as well.

Sherer: Yes, it is more than visualization. Looking at things such as coherency and performance – performance analysis requires hundreds of thousand of simulations to get sufficient data points, but you still have to aggregate that data to come up with a number or an assessment. But there is much more that we will end up doing with the data available. Yes, we do need to generate more. We do not come close to coving the state space of the design.

Palfy: And you can abstract away if you have prediction algorithms that help you choose the right prover, the right engine for the problem.

Mullen: Once you have the data you can put machine learning on top of it.

Sherer: I see more data, more storage, more compaction, more access needed over time.

SE: Within the EDA industry, we have developed debuggers that are intended for highly parallel hardware. We have created a lot of knowledge within the industry about how to handle them, how to deal with predictability, stability, repeatability — how to handle various issues. What can we teach the rest of the industry?

Casotto: That is where I am now. I see this as part of my job. I need to apply what I know in the field to automotive design and it seems to me as if I am looking back in time. They have fancy parallelism, but in terms of methodology, they are still at the pushing of rectangles phase. We should be able to bring more automation to the other domains.

Palfy: We went into automotive only to find customers doing fault propagation by hand. It was working for them up until a certain point. But now they need to be faster, and the ISO standards are forcing them to change. We are the big shots because we have the knowledge, because we built that up in other domains, and now we can apply it to something completely different.

Casotto: Parallelism and pipelining.

Singhal: The principles that work for us – especially in formal verification, abstraction, decomposing the problem – formal verification started in software and came to hardware and it worked very nicely. But now, someone mentioned Meltdown in automotive safety. There are so many applications where people are coming back to rethink the problem and the top-down approach, where you take a look at the system-level problem, break it down, build a sub-system, decompose. This is exactly what we have been doing in terms of practicing verification.