Meta

Dr. Pranav Ashar, CTODr. Ashar brings two decades of EDA expertise to Real Intent. He previously worked at NEC Labs (Princeton, NJ) developing formal verification technologies for VLSI design. He authored about 70 papers and co-authored a book titled "Sequential Logic Synthesis." He has 35 patents granted and pending, many of which have been licensed or part of business enablement. He holds a Ph.D. in EECS from UC Berkeley. « Less

Dr. Pranav Ashar, CTODr. Ashar brings two decades of EDA expertise to Real Intent. He previously worked at NEC Labs (Princeton, NJ) developing formal verification technologies for VLSI design. He authored about 70 papers and co-authored a book titled "Sequential Logic Synthesis." He has 35 patents granted and pending, … More »

The King is Dead. Long Live the King!

Not long ago, functional simulation and static timing analysis was it for RTL verification. In fact, it was all that was needed because the inner-loop of computation and data-transfer on a chip was one synchronous block. As chip complexities grew and gate-level simulation became unviable, formal equivalence checking stepped in to pick up the slack with orders of magnitude improvement in productivity in comparing gate and RTL representations. But the paradigm remained the same even as the methods changed – verification still needed to cover only the functional input space as comprehensively and efficiently as possible.

Then, somehow, things changed under the hood. Computation on a chip got fragmented out of necessity and with significant consequences. An illustrative example of this trend is the multicore chip by Tilera, Inc. shown here, Inc. It is a 64-core processor with a number of high-speed interfaces integrated on chip.

Tile64 Processor Block Diagram

For one, it has become impractical to send a signal in one clock cycle from one end of the chip to another in one clock cycle, as well as to send the same clock to all parts of the chip with manageable and predictable skew. It is also energy inefficient and practically impossible to keep raising the clock frequency. Higher performance can increasingly only be achieved with application-specific cores or on-chip parallelism in processors. As a result, computation is being done increasingly in locally synchronous islands that communicate asynchronously with each other on chip. This was predicted some time ago, but is now truly coming to roost in the form of heterogeneous and homogeneous multicore chips. With fine-grain fragmentation, communication bandwidths and latencies between the computation islands have come under the design scanner, and protocols for transferring data and signaling between the islands are beginning to push the limits.

A second important change has been that energy and power optimization is now more aggressive than ever. Beyond parallelism-for-performance and custom cores, this trend has also brought once arcane design techniques into the mainstream. Each island runs at its optimal frequency, and dynamic control of clocks, clock frequencies and Vdd is now par for course.

Finally, chips are now true systems in that they integrate computation with real-world interfaces to peripherals, sensors, actuators, radios, and you name it. And, these interfaces must talk to the chip’s core logic at their own speeds and per their chosen protocols. Many of these interfaces are also pushing the performance limits of the core logic.

An apt analogy is that it is as if chips have transitioned from an orderly two-party political system to an Italian or Indian multi-party system in which the various parties must align with each other at periodic intervals to accomplish something and each party has its own chief whip to get the troops to toe the party line.

The implication of this trend on chip verification is that it has gotten messier – one can’t cleanly abstract timing from functional analysis any more, i.e., the functional space and the timing space must be explored together. Deterministic functional simulation with fixed clock frequencies and delays does not cover all failure modes, and static timing analysis neglects the dynamic and data dependent nature of interaction between clock domains in the presence of unrelated clocks and variability. We are still not in the world where we must timing-simulate everything, but the new complexity is daunting nevertheless.

The New Signoff Solution

In order to mitigate this complexity, it is essential that the verification tool first decipher design intent to localize the analysis requirements. This exercise also helps make debug more precise. To be sure, this is harder as optimizations get more aggressive – the boundary between computation and interface blurs and designers resort to ever more innovative techniques. Real Intent was prescient in predicting the new verification paradigm many years ago. After much experimentation and interaction with design companies, we have demonstrated that automatic and reliable capture of design intent is indeed viable for clock domain crossings.

The design intent step triages the design, finds many types of bugs, and sets up local analysis tasks and models (potentially with special algebras to capture the timing and variability effects) for further formal analysis and simulation. I call this the verification 4-step of intent extraction, formal analysis, simulation, all integrated into a systematic hierarchical approach of analysis and reporting for scalability.

We find from our customers that the special verification requirement for clock domain crossings is now an essential part of the signoff process for all chips. Similar customized signoff is also called for in other contexts like DFT and power optimization for which failures cannot reliably be caught with functional simulation. Effectively, the old paradigm of “functional simulation + static timing analysis” is obsolete and the sign-off flow today looks more like the figure shown above.