The Power of Cadence System Power Flow vs. Viewing from the Top

I feel that I must respond to the following blog published by Frank Schirrmeister. Virtual prototypes clearly have their value and their place in the SoC design flow (especially as platforms for software development) but they are hardly a substitute for hardware-assisted solutions and you need to find a way to connect them to your implementation and verification flows otherwise what you see may not be what you get. Let me start with some facts:

Both of the tools (Palladium accelerator/emulator and InCyte Chip Estimator) described at the above blog as "products in question" in fact have very fast ramp-up time and are being used in production by many customers. The InCyte Chip estimator tool can be brought up by designers in less than an hour and Palladium systems have shown again and again during the last 7 years their ability to bring-up new designs in less than a week with thousands of successful projects taped-out. The addition of power information into these tools can be extracted easily by adding power information (based on a popular industry format - liberty files) into the analysis.

Let me provide several examples, we have seen recently.

Case A: A start-up company was looking for a new funding. The company had engaged with Cadence and within less than a couple of weeks brought their full SoC design into Palladium, allowing them to demonstrate it into their VCs. If this company had chosen a Virtual Platform solution, they would have been out of business before the full environment was up and running. If this company had wanted to estimate power consumption accurately pre-silicon with SW applications, only a single solution in the market would allow them to do it - Palladium Dynamic Power Analysis. Based on the recent results, one of our customers confirmed 5% accuracy between the dynamic power analysis switching results in Palladium and the real silicon measurement in the lab. By any means, this is what the customers need. Fast and accurate power analysis at early phase of the design.

Case B: A large size semiconductor company was trying to win a socket with their customer. In order to win, they had to prove this customer the architectural performance improvements they had achieved with their new design. This analysis had to be done very early in the design process. They tried to simulate the environment using virtual platform but the results were not accurate enough so their customer asked them to incorporate a cycle-accurate simulator into the environment. However, this environment would not run fast enough. So, they ported the design into a Palladium system and showed the results to the end-customer helping them to win the desired socket. Palladium Dynamic Power Analysis can be easily added and measured with any design. Any new SystemC IP that will be created in this design can be ported to the emulation system within days (and in some cases within hours) using combination of C-to-Silicon high-level synthesis tool and Palladium Compiler.

Case C: Cisco recently evaluated InCyte Chip Estimator results (see article) and confirmed 10% die area accuracy compared to real silicon. According to the information we have collected from 130 designs the power estimation provided by InCyte Chip Estimator is 30% accurate. The input information to the tool can include extraction of data based on dynamic simulation, emulation (or even silicon results) of IPs from previous designs and statistical information.

In his blog, Frank said: "Now the accuracy is much better and real software can run on the RTL given sufficient hardware support, but the ability to make trade-offs is very, very limited. The amount of effort it took to first write the RTL, to then verify it and even bring it up on hardware, altogether is so prohibitively expensive that in most cases fundamental architecture changes are hopeless at this point." Although the number of customers running full SoC with embedded software based on commercial virtual platforms is growing, it will take some time until all legacy IPs will be described in high-level of abstraction and I predict that these platforms will continue to operate in parallel and as a hybrid solution to RTL emulation platforms. As was stated above, big reason our customers use RTL emulation platforms is for accuracy, and while virtual platforms can offer certain performance, eventually the need to accuracy becomes critical and can not be overlooked, even for initial performance and power estimation analysis. Frank seems to forget in his statement above that the average bring-up time of new virtual platforms takes 6-12 months while the average bring-up time of many emulated designs takes days.As any other flow, the power estimation flow is not an exception.

There is no single tool which can solve all your problems. I agree - Virtual platform seems the "dream comes true" solution, allowing users to make performance and architectural trade-offs as they run their applications and software together early in the design cycle however, it does not come for free. Most customers do not have the models, the infrastructure or the people allowing them to build these platforms. Even if they get these models or experts to build the platform (from the EDA vendor or from their own company), it takes long time to do it and if your design cycle is short, you may miss the mark and not get the platform up and running but only after your RTL is ready. Now, even if you build this platform successfully 9-12 months in advance, how do you know that your virtual platform representing your real design? How do you connect it to your verification and implementation environment and realistic power information? Frank seems to overlook these things. Looking at the analogy of the story described at the blog above, using a system-level platform that is not targeting the actual hardware for performance analysis and power trade-offs guarantees that the Chamelon will become a snake and you will get bitten. This is something even Frank's 3 year old smart daughter can understand! You must now create a methodology to correlate your discoveries at the system-level for power analysis with those actual results at the RT level or even at silicon... Who will be doing this effort while you are busy working on your next architecture?

Cadence is taking instead, a more pragmatic approach. Our solution is being used today, effectively to solve these issues with a focus on getting feedback from the "real" silicon. ESL is all about discovery but also about connecting to the reality of what you are doing today.... not just about creation of another model that has to be maintained and synced up manually with your implementation. This connection must be there and should automatically get updated as new IPs are being created. Our System power estimation/exploration flow includes the following steps:

Use InCyte Chip Estimator to run quickly pre-RTL (pre IP selection) static power estimation for your SoC with 30% accuracy, decide upon your low-power techniques and automatically generate CPF file that carries these techniques and can be used as the input power information through the entire design and implementation flow.

Model your newly created IPs in SystemC and quickly map these to RTL (or gates) with C-to-Silicon Compiler.

Get initial SoC (with embedded software) estimated average power results and identify your "interesting" peak power windows using Palladium Dynamic Power Analysis.

Get accurate estimation (+/-5%) of your SoC average and peak power results using real stimulus actual embedded SW with Palladium Dynamic Power Analysis, leveraging RTL Compiler accurate power analysis engine.

These 5 steps can be done in matter of days, instead of weeks or even months, as assembling Virtual prototypes would require. A combination of static power analysis, high-level synthesis for newly created IPs and dynamic power analysis with emulation can provide you good SoC power estimation and exploration flow (including hardware and software) even at the early phase of your design. As always, comments are welcome.

Comments

Anonymous

I think you are a bit pessimistic about virtual platforms.... if they are derivatives of current designs and created a sufficiently high level of abstraction, they can be created in a matter of days or weeks rather than months or years. A six-to-twelve-month span sounds like a pretty detailed clock-level model, which is not what you should start with. Go for software-timed or loosely-timed, as discussed by Jason Andrews at Cadence some time ago: www.cadence.com/.../is-host-code-execution-history.aspx

In the end, you need both. A VP to get software started early and ideas for whether a design is useful at all, and then hardware acceleration to check the final design against your detailed goals.

In many cases the "missing model syndrome" is what holds back virtual platforms. Even a fairly comprehensive TLM IP library doesn't cover the custom hardware being designed. It's hard to justify spending a lot of time to create models for the virtual platform when hardware engineers are busy creating RTL, software engineers are busy writing software, and verification engineers are busy finding bugs in both. Some companies are investing the time, but not enough. Emulation has been successful for many years because it uses "design artifacts" that everybody must create, RTL. As C-to-Silicon spreads it will make SystemC models available as design artifacts, not as a separate project just to enable software execution. Making virtual platforms easier to create in a shorter time with better linkage to the design process is critical. There will always be pros and cons of virtual models vs. actual models (emulation and FPGA prototypes), both play a vital role and both are needed as a foundation for tasks such as verification and power analysis.

Hi Ran: Thanks for your post. it looks like we agree on a couple of things, especially that power analysis as early as possible is very important. The flow Cadence suggests works pretty well and I have seen it at users being used. One remaining issue is how well the synthesized results correlate to the actual implementation later. Please find more detail in my post at www.synopsysoc.org/viewfromtop. Our means are different but no solution is universal ...