We use compilers on a daily basis as if their correctness is a given, but compilers are programs too, and can potentially contain bugs. I always wondered about this infallible robustness. Have you ever encountered a bug in the compiler itself? What was it and how did you realize the problem was in the compiler itself?

They aren't infallible. There are compiler bugs - it's just that they are very rare.
–
ChrisF♦Feb 25 '11 at 16:07

3

Bugs become rarer as you descend the code stack: application bugs are more common than compiler bugs. Compiler bugs are more common than CPU (microcode) bugs. This is actually good news: can you imagine if it were the other way around?
–
FixeeFeb 25 '11 at 17:39

19 Answers
19

They get tested thoroughly via usage by thousands or even millions of developers over time.

Also, the problem to be solved is well defined (by a very detailed technical specification). And the nature of the task lends itself easily to unit / system tests. I.e. it is basically translating textual input in a very specific format to output in another kind of well defined format (some sort of bytecode or machine code). So it is easy to create and verify test cases.

Moreover, usually the bugs are easy to reproduce too: apart from the exact platform and compiler version info, usually all you need is a piece of input code. Not to mention that the compiler users (being developers themselves) tend to give far more precise and detailed bug reports than any average computer user :-)

Plus much of the compiler code can probably be proven correct.
–
biziclopFeb 25 '11 at 16:10

1

@Péter: Lexer/parser generators seem to be rather rare in the more widely used compilers - most write lexer and parser by hand for various reasons, including speed and lack of a sufficiently smart parser/lexer generators for the language in question (e.g. C).
–
delnanFeb 25 '11 at 17:39

You have an "observer bias". You don't observe bugs, and therefore you assume that there aren't any.

I used to think like you do. Then I started writing compilers professionally, and let me tell you, there are lots of bugs in there!

You don't see the bugs because you write code that is just like 99.999% of all the rest of the code that people write. You probably write perfectly normal, straightforward, clearly correct code that calls methods and runs loops and doesn't do anything fancy or weird, because you're a normal developer solving normal business problems.

You don't see any compiler bugs because the compiler bugs aren't in the easy-to-analyze straightforward normal code scenarios; the bugs are in the analysis of weird code that you don't write.

I on the other hand have the opposite observer bias. I see crazy code all day every day, and so to me the compilers seems to be chock full of bugs.

If you sat down with the language specification of any language, and took any compiler implementation for that language, and really tried hard to determine whether the compiler exactly implemented the spec or not, concentrating on obscure corner cases, pretty soon you'd be finding compiler bugs quite frequently. Let me give you an example, here's a C# compiler bug I found literally five minutes ago.

static void N(ref int x){}
...
N(ref 123);

The compiler gives three errors.

A ref or out argument must be an assignable variable.

The best match for N(ref int x) has invalid arguments.

Missing "ref" on argument 1.

Obviously the first error message is correct and the third one is a bug. The error generation algorithm is trying to figure out why the first argument was invalid, it looks at it, sees that it is a constant, and does not go back to the source code to check whether it was marked as "ref"; rather, it assumes that no one would be foolish enough to mark a constant as ref, and decides that the ref must be missing.

It's not clear what the correct third error message is, but this isn't it. In fact, it is not clear if the second error message is correct either. Should overload resolution fail, or should "ref 123" be treated as a ref argument of the correct type? I'll now have to give it some thought and talk it over with the triage team so that we can determine what the correct behaviour is.

You've never seen this bug because you would probably never do something so silly as to try to pass 123 by ref. And if you did, you probably wouldn't even notice that the third error message is nonsensical, since the first one is correct and sufficient to diagnose the problem. But I do try to do stuff like that, because I'm trying to break the compiler. If you tried, you'd see the bugs too.

Good error messages after the first one are very hard to do.
–
user1249Feb 28 '11 at 23:08

1

@MKO: Of course. Lots of bugs don't get fixed. Sometimes the fix is so expensive and the scenario is so obscure that the cost isn't justified by the benefits. And sometimes enough people have come to rely on the "buggy" behaviour that you have to keep maintaining it.
–
Eric LippertMar 1 '11 at 7:12

I've encountered two or three in my day. The only real way to detect one is to look at the assembly code.

Although compilers are highly reliable for reasons other posters have pointed out, I think compiler reliability is often a self-fulfilling assessment. Programmers tend to view the compiler as the standard. When something goes wrong, you assume its your fault (because 99.999% of the time it is), and change your code to work around the compiler problem rather than the other way around. For example, code crashing under a high optimization setting is definitely a compiler bug, but most people just set it a little lower and move on without reporting the bug.

+1 for "viewing the compiler as the standard." I've long maintained that there are two things that truly define a language: the compiler and the standard library. A standards document is just documentation.
–
Mason WheelerFeb 25 '11 at 17:28

4

@Mason: That works well for languages with one implementation. For languages with many, the standard is vital. The real-life impact is that, if you complain about something, the vendor will take you seriously if it's a standards issue, and brush you off if it's undefined behavior or something like that.
–
David ThornleyFeb 25 '11 at 17:52

They don't. We do. Because everybody uses them all the time, bugs are found quickly.

It's a numbers game. Because compilers get used so pervasively, it is highly likely that any bug will be triggered by someone, but because there's such a large number of users, it is highly unlikely that that someone will be you specifically.

So, it depends on your viewpoint: across all users, compilers are buggy. But it is very likely that someone else will have compiled a similar piece of code before you did, so if their was a bug, it would have hit them, not you, so from your individual point of view, it looks like the bug was never there.

Of course, on top of that, you can add all the other answers here: compilers are well researched, well understood. There is this myth that they are hard to write, which means that only very smart, very good programmers actually attempt to write one, and are extra careful when they do. They are generally easy to test, and easy to stress test or fuzz test. Compiler users tend to be expert programmers themselves, leading to high quality bug reports. And the other way around: compiler writers tend to be users of their own compiler.

You can find them in the darker corners where there are fewer testers. For example, to find bugs in GCC you should try:

Build a cross-compiler. You will find literally dozens of bugs in GCC's configure and build scripts. Some result in build failures during the GCC compile and others will result in failure of the cross-compiler to build working executables.

Build an Itanium version of GCC using profile-bootstrap. The last couple of times I tried this on GCC 4.4 and 4.5 it failed to produce a working C++ exception handler. The non-optimized build worked fine. No one seemed interested in fixing the bug I reported and I gave up fixing it myself after trying to dig through what was breaking in the GCC asm memory specifications.

Try building your own working GCJ from the latest stuff without following a distro build script. I dare you.

When you use strongly typed models in views there is a limit to how many parameters templates can contain. Obviously it can't take more than 4 template parameters, so that both examples below make it too much for the compiler to handle:

Not necessarily. In C and C++, there's an annoying amount of unspecified and undefined behavior, and that can legitimately vary based on optimization level or phase of the moon or movement of the Dow Jones indexes. That test does work in more tightly defined languages.
–
David ThornleyFeb 25 '11 at 17:54

They are usually very good at -O0. In fact if we suspect a compiler bug, we compare -O0 versus whatever level we are trying to use. Higher optimization levels go with greater risk. Some are even deliberately so, and labeled as such in the documentation. I've encountered a great many (at least a hundred during my time), but they are becoming much rarer recently. Nevertheless in pursuit of good specmark numbers (or other benchmarks important to marketing), the temptation to push the limits is great. We had problems a few years back where a vendor (to go unnamed) decided to make violation of parenthesis default -rather than some special clearly labeled compile option.

It can be hard to diagnose a compiler error versus say a stray memory reference, a recompile with different options may simply scramble the relative positioning of data objects within memory, so you don't know if it is your source code's Heisenbug, or a buggy compiler. Also many optimizations make legitimate changes in the order of operations, or even algebraic simplifications to your algebra, and these will have different properties with respect to floating point rounding and under/overflow. It is hard to disentangle these effects from REAL bugs. Hard core floating point computing is tough for this reason, because bugs and numerical sensitivty are often not easy to disentangle.

Found a glaring error in Turbo Pascal 5.5 years ago. An error present in neither the previous (5.0) nor the next (6.0) version of the compiler.
And one that should have been easy to test, as it wasn't a cornercase at all (just a call that's not that commonly used).

In general, certainly the commercial compiler builders (rather than hobby projects) will have very extensive QA and testing procedures in place.
They know their compilers are their flagship projects and that flaws will look very bad on them, worse than they'd look on other companies making most other products.
Software developers are an unforgiving bunch, our tool suppliers let us down we're likely to go look for alternatives rather than wait for a fix from the supplier, and we're very likely to communicate that fact to our peers who might well follow our example. In many other industries that's not the case, so the potential loss to a compiler maker as a result of a serious bug is far greater than that to say a maker of video editing software.

Have you ever encountered a bug in the
compiler itself?
What was it and how
did you realize the problem was in the
compiler itself?

Yup!

The two most memorable were the first two I ever ran across. They were both in the Lightspeed C compiler for 680x0 Macs back about 1985-7.

The first one was where, in some circumstances, the integer postincrement operator did nothing - in other words, in a particular piece of code, "i++" simply didn't do anything to "i". I was pulling my hair out until I looked at a disassembly. Then I just did the increment a different way, and submitted a bug report.

The second was a bit more complicated, and was a really an ill-considered "feature" that went awry. The early Macs had a complicated system for doing low-level disk operations. For some reason I never understood - probably having to do with creating smaller executables - rather than the compiler just generating the disk operation instructions in-place in the object code, the Lightspeed compiler would call an internal function, which at runtime generated the disk operation instructions on the stack and jumped there.

That worked great on 68000 CPUs, but when you ran the same code on a 68020 CPU, it would often do weird things. It turned out that a new feature of the 68020 was a primitive instruction 256-byte instruction cache. This being early days with CPU caches, it had no notion of the cache being "dirty" and needing to be refilled; I guess the CPU designers at Motorola didn't think about self-modifying code. So if you did two disk operations close enough together in your execution sequence, and the Lightspeed runtime built the actual instructions at the same location on the stack, the CPU would erroneously think it had an instruction cache hit and run the first disk operation twice.

Again, figuring that out took some digging around with a disassembler, and a lot of single-stepping in a low-level debugger. My workaround was to prefix every disk operation with a call to a function that did 256 "NOP" instructions, which flooded (and thus cleared) the instruction cache.

Over the 25-ish years since then, I've seen fewer and fewer compiler bugs over time. I think there are a couple of reasons for that:

There's an ever-increasing set of validation tests for compilers.

Modern compilers are typically divided into two or more parts, one of which generates platform independent code (e.g. LLVM's targeting what you might consider an imaginary CPU), and another which translates that into instructions for your actual target hardware. In multi-platform compilers, the first part gets used everywhere, so it gets tons of real-world testing.

The older version of the PIC C compiler we (used to) inflict on work experience students couldn't generate code that used the the high priority interrupt correctly.
You had to wait 2-3 years and upgrade.

The MSVC 6 compiler had a nifty bug in the linker, it would segmentation fault and die from time to time for no reason. A clean build generally fixed it ( but sigh not always).

In some domains, such as avionic software, there are extremely high certification requirements, on the code and hardwware, as well as on the compiler.
About this last part, there is a project which aims at creating a formally-verified C compiler, called Compcert. In theory this kind of compiler is as reliable as they come.

I've seen several compiler bugs, reported a few myself (specifically, in F#).

That said, I think compiler bugs are rare because people who write compilers are generally very comfortable with the rigorous concepts of computer science that make them really conscious about the mathematical implications of code.

Most of them are presumably very familiar with things like lambda calculus, formal verification, denotational semantics etc. -- stuff that an average programmer like me can only barely comprehend.

Also, there's usually a fairly straightforward mapping from input to output in compilers, so debugging a programming language is probably a lot easier than debugging, say, a blog engine.

I found a bug in the C# compiler not too long ago, you can see how Eric Lippert (who is on the C# design team) figured out what the bug was here.

In addition to the answers already given, I'd like to add a few more things. Compiler designers are often extremely good programmers. Compilers are very important: most programming is done using compilers, so it's imperative the compiler is of high quality. It's therefore in the best interest of companies making compilers to put their best people on it (or at least, very good ones: the best ones might not like compiler design). Microsoft would very much like their C and C++ compilers to work properly, or the rest of the company can't do their jobs.

Also, if you're building a really complex compiler, you can't just hack it together. The logic behind compilers is both highly complex and easy to formalize. Hence, these programs will often be built in a very 'robust' and generic way, which tends to result in less bugs.