Bridging the gap between RTL development and design implementation

For the gigascale systems-on-chip, Eyal Odiz, VP of engineering, RTL synthesis and test automation, Synopsys, Inc. outlines the importance of improving predictability of outcomes across the flow to achieve faster design convergence.

The advent of massively integrated, multimillion-instance IC designs is driving the demand for ever-faster design convergence at a time when semiconductor companies are facing unrelenting time-to-market pressures that mandate ever-shorter tapeout schedules. Register transfer level (RTL) code for these “gigascale” systems-on-chip is often developed by geographically-dispersed teams and combined with third-party IP and blocks from previous designs that have been augmented to meet the needs of the current design. Only after the RTL and reference macros and libraries have been fully defined and are consistent with each other can designers actually begin the implementation process—comprised of RTL synthesis, design planning and place-and-route (P&R) tasks—that will ultimately determine if their design goals can indeed be met.

By this time, however, decisions about the micro-architecture have already been made and designers are essentially locked-in to their early choices. If the design does not meet its timing, area, power, test and routability requirements, substantial resources will need to be applied to achieve design closure. In extreme cases, even significant changes to the RTL are needed, though typically most of the effort is focused on the synthesis and P&R tasks. The process of convergence, characterized by time-consuming design iterations between synthesis and P&R, can consume a large fraction of total design implementation time.

Faster convergence could be achieved by assessing whether design goals can be met earlier in the design cycle, during the RTL development phase, instead of waiting until the implementation phase to discover and correct critical issues. This early assessment or “exploration” of the RTL and constraints provides designers the opportunity to determine prior to synthesis if a design will likely meet its goals and to perform what-if analyses and make changes as needed to create a better RTL heading into implementation. Fine-tuning the RTL and constraints in this manner uncovers issues earlier in the flow and reduces design iterations later, during implementation, when iterations are more resource-intensive and pose the greatest risk to tapeout schedules.

For RTL exploration to be effective when the RTL, libraries and constraints are still under development, the technology must have tolerance for incomplete and mismatching design data. When missing cells or inconsistencies are encountered in the RTL, it should not only report these discrepancies but also internally resolve them and generate a netlist to enable physical exploration of the current design. With an early netlist, designers could explore a variety of floorplan options prior to synthesis to evaluate how physical constraints impact the timing and routability of their designs. Performing physical design exploration in parallel with RTL development would not only shorten the design cycle but also create an even better starting point for synthesis, since the physical constraints would be taken into consideration in the development of the RTL.

To facilitate efficient RTL exploration, the technology must be easy to deploy and compatible with designers’ existing scripts and flows for synthesis. In addition, it must execute faster than typical synthesis runtimes while producing results similar enough to enable identification of specific timing issues in the RTL or constraints. For example, it should be able to identify critical timing paths even though the worst-case negative slack numbers may not be exactly the same as those produced by synthesis. To ensure a better starting point for synthesis, quality-of-results for RTL exploration should correlate to within about 10 percent of results produced by synthesis, while taking into account timing, area, leakage power, dynamic power and routing congestion. In the same measure, synthesis itself must correlate to within about 5 percent of P&R results. Only when there is tight correlation across the entire flow—first between RTL exploration and synthesis, then between synthesis and placement—can faster development of the RTL, constraints and floorplan lead to fewer iterations between synthesis and P&R and, in the process, faster design convergence.

In conclusion, new technology with the flexibility, ease-of-use, speed and correlation to accommodate the needs of RTL exploration would provide designers the opportunity to evaluate their RTL and to begin physical design exploration earlier, during RTL development. The added visibility into design implementation issues at this early stage would lead to development of a better RTL, constraints and floorplan which, when passed to synthesis that is tightly correlated with placement, lowers the risk of design iterations during implementation. Improving predictability of outcomes across the flow, from exploration to implementation, is essential to achieving faster design convergence in the era of gigascale systems-on-chip.

About the author:Eyal Odiz is vice president of engineering, RTL synthesis and test automation, Synopsys, Inc. Odiz holds a bachelor of science in civil engineering and a master of science in computer science, both from Technion in Haifa, Israel.

Another aspect is that it is more important to get a big complex design to work than it is to optimize every path and every Boolean function.
Forty years ago the opposite was true.
Another thing is embedded processors are programmed in a high level language, but the compiling process breaks the code down into such tiny steps that it takes millions of cycles to do any meaningful function. Things like pipelining, branch prediction, multilevel caches are assumed to be absolute necessities but they are not. The typical if/else. for, while, do are just structured ways of expressing compare and jump constructs and assignments can be streamlined by putting them into Reverse Polish Notation. Then the cpu becomes a very small physically, and use of multiport memories allows the statements to execute in very few cycles. The result is that function is coded in high level language, code size is small, power is low because thousands of flipflops do not have to be clocked simultaneously. Changes are done by altering memory content rather than redoing the entire chip compilation and timing. One concern might be that a total chip design may be a few cycles faster, but effectively doing more function per cycle offsets that effect.

As a technical communicator writing while the IC architecture team worked with the RTL development and design teams, I encountered the frustrations the author, Odiz, seeks to prevent. This article is an excellent recommendation to improving IC development/design--making sure the teams all around the world compile, aggregate, and build/test designs implications more quickly. One of the solutions my company looked at was company-wide (global) use of an executable specification. Our teams wrote scripts to test the RTL code against the design recommendations. We were able to detect early on the changes that we needed to make before it hit a critical path.

As an old timer, one of the most frustrating things is to have to wait for synthesis and or placement to run first. Early in the design, it is the logic that matters, not whether it has been optimized or whether everything feeds an output pine, else it gets synthesized away. I have proposed to map the HDL into a set of generalized classes to be used in an OOP simulator so the hardware function can be used in the software development environment. This way, the existing IP as well as the new design all come together in a program IDE where the debug and compiler checking is much better than in the hardware design flow. Why not take an evolutionary approach such as what this article suggests without all the uncertainties of HLS?
After all when HLS matures and obsoletes HDL that too can be used in a similar way.