I was listening to a talk by Daniel Kroening, a software verification researcher at Oxford University , who was explaining the certification process for safety-critical software. He mentioned that one of the requirements is that all test cases must be verified on the actual hardware. Now, in the case of avionics software, that means one test flight for every test case. Since test flights are expensive, optimizing the number of test cases required to cover everything is extremely important.

This is one of these cases where you get a dot and you connect them with other seemingly random dots and see that you have a line. In this case, what I realized is that the difficulty in verification is not really finding bugs (bugs are easy, right?), but in how efficiently we find these bugs. Recently I posted on how constrained random testing is essentially a (hard) optimization problem. I also posted on the best verification methodology being to combine orthogonal methodologies in order to optimize bug finding productivity. The criticality of optimizing safety-critical test cases was another data point that led me to this realization.

This is reflected in the fact that many of the most successful verification tools introduced over the last twenty years have succeeded by optimizing verification productivity. As we all know, faster simulation really does very little to improve the quality of a design, but it helps enormously in improving verification productivity. Hardware verification languages are probably the second most important development in verification in the last twenty years. But again, they don’t improve quality, they simply improve productivity in developing verification environments.

This is not to say that there have not been tools that improve quality. Formal proof clearly improves quality when it can be applied, although semi-formal verification, which focuses on bug hunting is more of a productivity increase. In-circuit emulation allows you to find bugs that could not be found in simulation due to being able to run on the real hardware. However, emulation used simply as faster simulation is really just a productivity increase.

Is verification optimization related to the well discussed verification bottleneck (you know, the old saw about verification consuming 70% of the effort)? Verification became a bottleneck when the methodology changed from being done predominantly post-silicon to being done predominately pre-silicon. Many people saw the resulting dramatic increase in verification effort as being correlated to increased design size and complexity. If this were true, then verification would consume 98% of the effort today since this switch occurred twenty years ago and there have been many generations of products since then. Since relative verification effort has not changed significantly over the last twenty years, I think it is safe to say that verification effort is constant with increased design size and that differences in relative effort reflect differences in methodology more than anything.

The real question is: will verification optimization become more important in the future as designs become larger and will that result in relative verification effort rising? If there is no change to design methodology, we would expect verification optimization effort to remain constant. If high-level synthesis allows us to move up the abstraction ladder, this should improve the ability to optimize verification. In short, there does not seem to be a looming crisis in overall verification optimization.

However, if we look at the software side we see that software content on hardware platforms is growing rapidly which is putting enormous pressure on the ability to verify this software. Effectively, we have managed to forestall the hardware verification optimization crisis by moving it to software

In an article in a recent issue of Computer entitled “Really Rethinking Formal Methods”, David Parnas questions the current direction of formal methods research. His basic claim is that (stop me if this sounds familiar) formal methods have too low ROI and researchers, rather than proclaiming the successes, need to recognize this and adjust their direction. As he so eloquently puts it:

if [formal methods] were ready, their use would be widespread

I haven’t spent a lot of time trying to figure out if his proscriptions make sense or not, but one thing stood out to me. He talks about a gap between software development and older engineering disciplines. This is not a new insight. As far back as the 60’s, the “software crisis” was a concern as the first large complex software systems being built started experiencing acute schedule and quality problems. This was attributed to the fact that programming was a new profession and did not have the rigor or level of professionalism of engineering disciplines that had been around for much longer. Some of the criticisms heard were:

programmers are not required to have any degree, far less an engineering degree.

programmers are not required to be certified.

traditional engineering emphasizes using tried and true techniques, while programmers often invent new solutions for every problem.

These explanations are often used as the excuse when software (usually Microsoft software) is found to have obvious and annoying bugs. But is this really the truth? Let’s look at an example of traditional engineering to see if if this holds up.

Bridge building is technology that is thousands of years old. There are still roman bridges built two thousand years ago that are in use today. Bridges are designed by civil engineers who are required to be degreed, certified engineers. Bridge design follows a very rigorous process and is done very conservatively using tried and true principles. Given that humanity has been designing bridges for thousands of years, you would think that we would have gotten it right by now.

Even today, bridges are built with design flaws that result in accidents and loss of life. One could argue that, even so, the incidence of design flaws is far less in bridges than in software. But this is not really an apples to apples comparison. The consequences of a bug in, say, a web browser are far less than a design flaw in a bridge. In non-safety critical software, economics is a more important factor in determining the level of quality of software. The fact is, most of the time, getting a product out before the competition does is economically more important than producing a quality product.

However, there are safety critical software systems, such as airplanes, medical therapy machines, spacecraft, etc. It is fair to compare these systems to bridges in terms of catastrophic defect rates. Let’s look at one area in particular, commercial aircraft. All commercial aircraft designed in the last 20 years rely heavily on software and, in fact, would be impossible to fly if massive software failures were to occur. Over the past 20 years, there have been roughly 50 incidents of computer-related malfunctions, but the number of fatal accidents directly attributed to software design faults is maybe two or three. This is about the same rate of fatal bridge accidents attributable to design faults. This seems to indicate that this gap between software design and traditional engineering is not so real.

The basic question seems to boil down to: are bridges complex systems? I define a complex system as one that has bugs in it when shipped. It is clear that bridges still have that characteristic and, therefore, must be considered as complex systems from a design standpoint. The intriguing question is, given that they are complex systems, do they obey the laws of designing complex systems? I believe they do and will illustrate this by comparing two bugs, one a bridge design fault and another a well known software bug.

The London Millennium Footbridge was completed in 2000 as part of the millennium celebration. It was closed two days after it opened due to excessive sway when large numbers of people crossed the bridge. It took two year and millions of pounds to fix. The bridge design used the latest design techniques, including software simulation to verify the design. Sway is a normal characteristic of bridges. However, the designers failed to anticipate how people walking on the bridge would interact with the sway in a way to magnify it. The root cause of this problem is that, while the simulation model was probably sufficiently accurate, the environment, in this case, people walking on the bridge, was not accurate.

This is a very common syndrome in designing complex hardware systems. You simulate the chip thoroughly and then when you power it up in the lab, it doesn’t work in the real environment. I describe an example of this exact scenario in this post.

In conclusion, it does seem that bridges obey the laws of designing complex systems. The bad news is that the catastrophic failure rate of safety-critical software is of roughly the same magnitude as that of bridges. This means that we cannot expect significant improvements in the quality of software over the next thousand years or so. On the plus side, we no longer need to buy the excuse that software development is not as rigorous as “traditional” disciplines such as building bridges.