Prevention, quality and other innovations in hardware debug

Debug represents a major cost to hardware development organizations and a constant source of frustration for engineers. According to a survey commissioned by Mentor Graphics in 2010, verification engineers spend an estimated 32% of their time debugging code[1]. In a typical 8-hour day, that translates to more than 2.5 hours specifically dedicated to fixing defects.

Spending 2.5 hours debugging code can make for a frustrating day for an engineer, but what does that mean for the organization? According to the EE Times Global Salary and Opinion Survey from 2010, the annual compensation for a North American engineer, including bonuses and overtime pay, averaged $107,300[2]. From that, it’s easy to make a best case, back-of-the-napkin estimate capturing the cost of employing a medium sized team of engineers to build a reasonably sized ASIC. The result is a very big number.

Now take your napkin to the CEO and tell him that 32% of the employee compensation he pays out for developing a next generation ASIC - roughly $1 million - is going to be flushed away fixing defects that the team itself injects by writing buggy code.

That's a make-work project no CEO is going to want to pay for, yet assuming the results of the 2010 Mentor Graphics study are accurate, most of them are already doing it.

Spending 32% of an engineer's time fixing defects is unacceptable by any standard in any industry, which is why EDA companies and engineers alike have been searching for ways to expedite debug. Isolate bugs faster, fix bugs faster; two worthy goals with huge potential payoffs. But is this kind of reactive problem solving the right way to address an activity that consumes almost a 3rd of an engineer's day? Or does the amount of time we spend debugging code suggest a fundamental flaw in the way we go about designing hardware and writing code?

Bug prevention with test-driven developmentIn his day 2 keynote talk at DAC 2012 in San Francisco, Mike Muller, CTO at ARM, drew from his years of experience to make several interesting comparisons between generations of ARM cores and the practices used to design them. At one point in his talk, Mike wonders aloud “have designers got lazy?” when presenting data suggesting the Cortex-M0 was developed with 1/3,000,000th the engineering efficiency realized during the development of the ARM1[3]. Mike continues by suggesting an air of complacency in hardware development where very little innovation has taken place since moving on from the hand placement of transistors. When describing a “desperate validation challenge”, he says to find bugs “we do all of things we always used to do”.

In comparison, software developers, he suggests, “they’ve all moved on. They use frameworks, new languages; their programming paradigms are completely different… Most of the industry has moved on”. Interestingly, Agile development is first on his list of examples of how software developers have moved on.

Instead of continuing to foster an environment of complacency and reactive problem solving - or debugging - in hardware development, a proactive step toward eliminating the costs associated with defects would be to adopt design techniques that produce fewer defects. Test-driven development (TDD) is one such technique.

TDD is a continuous design technique whereby developers create a design by repeating a focused test-and-code cycle. Kent Beck is largely credited with developing the technique as part of the Extreme Programming movement in the late 1990s[4]. Ever since, TDD has been used successfully by software developers designing and writing code for many different types of applications in a variety of languages using many different frameworks. The practice of TDD has become a corner stone technique for many Agile development teams.

As the name suggests, in TDD the tests drive the design. Simply described, a developer starts by considering how a particular feature will ultimately be used. He/she captures those considerations in a test then writes the portion of the design required to pass the test. A design starts extremely small then grows incrementally with the addition of each test and corresponding code.

There are several documented benefits of TDD. First is the robustness and lower relative complexity of a design. Through a change in perspective, developers are forced to think about how a design is to be employed by a user as opposed to constraining their thought process to how certain features should be implemented. A thought process that begins with understanding usage helps address corner cases and conditions not immediately obvious when thinking strictly in terms of implementation. Designing specifically from a user perspective also tends to simplify a design by limiting opportunities for over-engineering.

Another obvious advantage of TDD is that it gives developers the ability to promptly and repeatedly validate correctness. Since code is written in very short increments, bugs can be found and fixed instantly. With automated unit tests run frequently - usually several times/day - changes to the code base can be verified immediately.

There are two additional points to make with respect to mechanics and code maintenance in TDD. During the TDD cycle, it is important that a developer observes a failure when running new tests for the obvious reason that the corresponding design code does not yet exist. Observing a failure minimizes the chance of obtaining false positives. Next, because TDD is a continuous design technique, the structure of the code is effectively fluid; the ideal structure today may not be the ideal structure weeks down the road. With a fluid code base, it is essential that developers make opportunities for review and clean up as code is written. For teams using TDD, continuous refactoring is critical.

Bug prevention in hardware developmentMany of the characteristics that make for great software developers are also found in great hardware developers; unsurprising since the design challenges in hardware and software development bare close resemblance as does the critical thinking required to overcome them. Both software and hardware engineers design products to solve real world problems. For both, solutions must be robust and address the needs of users. At a fundamental level, both hardware and software are crudely realized as code. The final representation may be different, but the bottom line is that developers in both industries write code and the code must be correct. Hardware is not software, but the similarities between them - in front-end development in particular - are hard to deny.

Regardless of industry, the motivations behind using a technique like TDD are clear. Just as software developers have seen for more than a decade, TDD can be applied in hardware development to create robust designs and prevent defects.

The most obvious application of TDD is in the creation of the actual product that goes out the door. Instead of passing responsibility for code quality to verification specialists, designers can share responsibility by using TDD to produce higher quality code. Another obvious application of TDD is in the development of functional verification IP and test benches. Relying on higher quality test environments can be key in isolating defects in functional verification, specifically with practices like constrained-random verification where, by definition, randomized stimulus and unforeseen interactions make defects difficult to diagnose. Finally, TDD is perfectly suited to development of golden reference models where the standard of quality is particularly high.Defects are a tremendous cost in hardware development and debug is one way we pay the price. If we want to eliminate the cost of debug, we need to drop our complacency and adopt more proactive approaches to development. TDD is a proactive development approach that can eliminate defects in hardware.

SVUnit v0.1A requirement of successful TDD is a framework that enables rapid test-and-code cycles. Until recently, there was no such framework for hardware developers creating designs with Systemverilog.SVUnit is a first generation xUnit framework specifically designed to enable TDD for hardware developers using Systemverilog[5]. With a structure based on popular xUnit frameworks like JUnit, CppTest and others, SVUnit helps hardware developers create robust designs and write high quality code using a technique many software developers have been using successfully for years. The framework is lightweight and available under the open source Apache2.0 license giving it a very low barrier to entry. SVUnit supports Systemverilog simulators from three major vendors as well as UVM-based testbench development making it an attractive choice for design and verification engineers alike.

SVUnit has three layers of hierarchy. At the lowest layer, a template generator is used to create unit test classes into which a developer adds tests targeting features of the unit under test (UUT). The framework generator is used to aggregate one or more unit test classes into a test suite that are in turn aggregated and instantiated in a test runner. With a goal of having the fastest possible cycle time, SVUnit compiles all three levels of hierarchy - which could be many test suites and unit tests - into a single executable. Optionally, test suites can be compiled and run independently. Generated log files report PASS/FAIL results on a test-by-test basis with an aggregated PASS/FAIL result for the test runner. The class hierarchy and supporting macros make it easy create and include new tests.A requirement of doing TDD is a reliable unit test framework. Thanks to the recent release of SVUnit, TDD is also now a technique accessible to all hardware developers creating designs with Systemverilog.

Why all the debug?Developers are responsible for defects; the damages are self-imposed. It doesn’t matter if you’re a designer, a verification engineer, a modeling expert, or anyone else designing and writing code, everyone injects defects. Debug accounts for 32% of our daily responsibilities only because we allow it to happen, not because high defect rates are a fundamental part of hardware development.While it may appear the cost of debug deserves our attention and innovative minds, just the opposite is true. Searching few new ways to inject quality into a design is where the focus should be in hardware development, not in improving debug tools and techniques. High quality designs are the real goal. Efficient debug is just a diversion.

About the authorNeil Johnson has been working in ASIC and FPGA development for more than 10 years. He currently holds the position of Principal Consultant at XtremeEDA Corp, a design services firm specializing in all aspects of ASIC and FPGA development. Neil is also co-moderator for AgileSoC.com, a site dedicated to the introduction of Agile development methods to the world of hardware development.

If you found this article to be of interest, visit EDA Designline where – in addition to my blogs on all sorts of "stuff" – you will find the latest and greatest design, technology, product, and news articles with regard to all aspects of Electronic Design Automation (EDA).

Also, you can obtain a highlights update delivered directly to your inbox by signing up for the EDA Designline weekly newsletter – just Click Here to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).

I am not a skeptic of TDD or Agile. My background is software. I think the Agile/Extreme methods are key not only for hardware designers but also for EDA software tool developers.
Great article and info on SVUnit. Thanks.

ShashiB, interesting perspective. Within the hardware community, I think it's pretty common to doubt some software techniques because the target technology is so obviously different so I'm certain you're not the only skeptic out there. I think there's some merit to the argument that software methods won't work for hardware when talking about agile in general (though I don't agree with it)... but the reason I think the potential for TDD is so great is because I think that argument goes out the window. When it comes to code/design quality, I think target technology is irrelevant. Regardless of target - software/hardware/simulation/synthesis/reference model/test tube/whatever - a design that is robust and functionally correct is far more valuable than untested code. Thanks to the mentor graphics study, it's pretty easy to make the point that paying for untested code (i.e. debug) is a very good way to kill your budget. From there, I'd hope that people can see that an aversion to risk is irresponsible when the methods you're using are so obviously inadequate. Like you point out, there's other things to consider. But strictly in terms of quality, I reckon TDD is a good alternative method.
Thanks for the comment!
-neil

Traditionally hardware/HDL designers have been wired to write code to direct Synthesis tools to generate certain type of structure. RTL code and simulation is not the end goal. That's where this differs from software where the software product itself is the end goal. The synthesis tools are getting better but still not at par to compilers targeting code to different computer architectures. This leaves a big gap between intention captured in RTL and the final silicon. Treating hardware design as a software project is not going to work until the tools reduce this difference between RTL and silicon. Current mentality seems to be if the designer has to do a whole bunch of tweaking post synthesis why spend time at RTL verification? Exhaustive verification seems to make sense in the final hardware system in the lab. Risky but that seems to be the reality.
Then there are the hardware project managers who actually measure the quality of the project or contribution of a design engineer by how early the chip gets into lab (especially for FPGA). This has not helped designers break the pattern, or train and evolve design flows. I want to say the debug time is much more than 32% and the approach taken is to through more bodies at the debug problem. The reason for this is also because of the earlier mentioned gap between RTL and silicon, and risk-averseness with using unproven newer methods.