Andrew Koenig

Dr. Dobb's Bloggers

The high cost of broken tools

January 27, 2010

People who build tools have a special responsibility to make the tools reliable--one that people who write end-user applications usually do not share. One reason for this responsibility is that people use those tools to build other applications, and the users of those applications do not communicate directly with the tool builders. That lack of communication makes it hard for the tool builders to know the full extent of the effects that bugs in their tools have on the end users.

This discussion is too theoretical, so let me make it more concrete. In 1994, Intel released a number of cpu chips with a hardware bug: Floating-point division could sometimes produce a slightly incorrect result. For example, if you were to multiply 4,195,835 by 3,145,727, and then divide the result by 3,145,727, you would expect to get 4,195,835 again. If you were using one of the defective processors, you would get 4,195,579.

Intel's first reaction was that only one in 9 million floating-point divisions would trigger the problem, so it was no big deal--but nevertheless they would replace processors for users who could show that the bug had caused trouble for them. Later, they liberalized their policy and offered to replace any processor with the bug.

You might say that this offer should have been enough. Indeed, it is hard to imagine what else Intel could do. Nevertheless, the mere existence of this bug causes a particular problem for people who build general-purpose tools such as compilers.

Suppose you are writing a compiler, and part of that compiler is a function to convert a string of decimal digits to a floating-point number. Your function is used whenever a user writes a floating-point literal as part of a program. Suppose further that your function does floating-point division. Is it possible that running your compiler on a processor with the division bug might cause a program to be compiled incorrectly? The answer is probably yes.

Suppose that not only is the answer yes, but that one of your users discovers that your compiler is producing incorrect results because of the processor bug. If you take your responsibilities to your users seriously, it is not sufficient to say "Yes, you have this problem, but it's not our problem; it's because you're running on a broken processor." The reason is that if you do say that, the user is apt to respond: "I don't care what the reason is; I just want to be able to trust the results from your compiler."

So as a responsible compiler writer, you now have two possible courses of action. The first is to check whether you are running on a broken processor, and to refuse to run under those circumstances. A user may grumble at a compiler that says "I'm sorry, but until you fix your processor, I'm not going to compile your programs," but at least the user won't blame you for the processor's problems.

Of course, the user may not have the ability to fix the processor in question; so a more humane choice is apt to be for the compiler to work around the processor bug. One way of doing so might be to avoid floating-point division completely in the compiler itself, and simulate it with integer arithmetic instead.

Let's suppose you have taken the second of these alternatives in your compiler. At what point is it safe to remove the workaround code? Only after the last broken processor has been retired from your user community. Otherwise, there will be a user out there who will complain that the last release of your compiler worked just fine but this new one doesn't--so how can the problem be in the processor that hasn't changed for years?

This hypothetical story illustrates a surprisingly common problem: When a tool--in this case, a processor--has a bug, the tool's users--in this case, the compiler writers--have to work around that bug in order to insulate the people downstream from the tools' users from the tool's problems. In other words, a bug in a tool causes trouble not only for the people who use the tool but also for the people who use the products built with that tool.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!