Contest: Craziest Compiler Output due to Undefined Behavior

The C and C++ standards fail to assign a meaning to a program that overflows a signed integer or performs any of the 190+ other kinds of undefined behavior. In principle, a conforming compiler can turn such a program into a binary that posts lewd messages to Twitter and then formats your disk. In practice, of course, we don’t expect such baroque consequences. Rather, the compiler is just going to do its job, which is to generate efficient code for all executions that don’t have undefined behavior. Disregarding Easter eggs such as the old version of GCC that would exec nethack, the compiler isn’t going to go out of its way to make undefined executions behave badly. It just doesn’t care about them. The problem is that as optimizers have gotten more clever, the consequences of undefined behavior have become a bit more strange. For example:

Using uninitialized data as an extra source of randomness caused a compiler to delete an entire seed computation. link

A compiler can evaluate (x+1)>x to both 0 and 1 in the same program, compiled at the same optimization level, for the same value of x. link

An infinite loop such as a counterexample search (where no counterexample exists) can be terminated by the compiler, permitting, for example, Fermat’s Last Theorem to be “disproved.” link

An undefined shift operation caused a compiler to delete an important safety check in Google’s Native Client. link

A reference to a possibly-null pointer in the Linux kernel caused the compiler to remove a subsequent null check, creating an exploitable vulnerability. link

A division instruction (that potentially crashes the process) can be moved in front of code that has externally visible side effects. link

A compiler can run the code inside if (p) { ... } and also inside if (!p) { ... } when p is not initialized. link

I will send a small but nice prize to the person who posts a comment describing the most interesting, surprising, or just plain crazy object code emitted by a compiler as a consequence of undefined behavior. Preference will be given to entries that:

Are posted before or on July 18 2012.

Describe interesting object code as opposed to describing a secondary consequence of that object code. A secondary consequence would be something like an ROP attack, launching a missile, or whatever.

Describe object code legitimately generated by a production-quality C or C++ compiler. In other words, I’m not interested in compiler bugs or in your own hand-crafted C compiler that emails Santa Claus when someone divides by zero.

Can be reproduced—so please, if possible, include compiler version, compiler options, a pointer to the code, etc.

If you readers can come to a consensus about the winner, I will go along with it. Otherwise, I’ll choose the winner. An example of an entry that would probably win (assuming that it could be reproduced) is “At -O2, LLVM 2.6 turns the attached C++ code into object code THAT ACTUALLY MAKES DEMONS FLY OUT OF MY NOSE.”

Here’s a similar discussion on Stack Overflow. However, people there are mostly talking about secondary consequences rather than interesting object code.

The short version of the story is that there is not exactly total agreement among compiler implementors about whether i++ is undefined when i is SHRT_MAX. My reading of the standard is that it is not undefined behavior.

Second, compilers tend to terminate infinite loops that are free of side effects.

So it’s very hard to know exactly what is going on when we compile this…

GCC-4.8 at -O2 compiles the following implementation of prefix sum computation into endless loop, turning code that accesses one element beyond array boundary into code that is virtually guaranteed to crash (which may considered to be a good effect of surprising [ab]use of undefined behaviour rules!). Loop exit test is eliminated since presence of a[i] access before exit test implies that ‘i < N' must be true.

Short summary: programmers use undefined behavior (pointer overflow). Because of UB, the generated instruction (unsigned comparison) may not do what programmers intended. Programmers “fix” the problem by keeping the UB and adding another comparison to check if wraparound occurred. Compiler removes the additional comparison.

Just because the pointers contain the same value does not mean they’re the same pointer; realloc ended the lifetime of the memory pointed to by ‘p’ and constructed a new pointer and returned that. Thus, the compiler concludes that ‘p’ and ‘q’ can’t alias each other.

Nick, realloc documentation emphasis that “The function MAY move the memory block to a new location, in which case the new location is returned.”. But it seems clang -O2 believes realloc “always” move the memory block to a new location thus ‘q’ and ‘q’ cannot alias each other.

Alireza, your observation (that the allocated block might not be moved) is correct. Even so, Clang is operating correctly here. Basically you have to read the fine print in the C standard which says that in effect, the block is released and then reallocated, but maybe at the same spot. Evil indeed.

Actually, the example in comment 9 reminds me of a blog post I started to write pointing out that while realloc() sometimes is an efficient way to enlarge a block, if you have made many copies of the pointer, the standard gives you no way to know whether the block has moved. The realloc() call makes the old pointer indeterminate, and illegal to compare to the new pointer. You have to update all copies.

realloc()’s modelization in Frama-C’s value-analysis-cum-C-interpreter warns about the p==q comparison in that example.
I thought the comparison was idiomatic, and that that warning was an unfortunate false positive, so thanks Nick for pointing out to me that the warning is justified and not a false positive after all.

(I still think it is idiomatic. I expect many codes use it, but it may not be too harmful in practice always to do the treatment intended for the case the block moved).

Nick, I fail to see how how that kind of behavior is unexpected. Every beginning C programmer knows that realloc doesn’t necessarily return the same pointer that you gave it, so it’s unsurprising that assigning a value to a location in memory that is not allocated doesn’t assign that value to another location which is allocated.

The problem was in Qt as QList, QVector and co. Tried to emulate the C99 flexible array which does not exist in C++ with an array of size 1.
Then, gcc would assume the variables used to index the array are always 0.

Pascal, given that I’m writing in C++ the line “if (p == q)” is the one statement in that muck which actually has defined behaviour: “Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address.” C++11 [expr.eq].

In C99 the comparison rule is different, but it’s actually governed by “The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.” in 6.2.4/2, so the comparison could be true or false!

Nick’s example seems to be the only one which involves libraries, as opposed to the C language itself. If malloc () and realloc () were user functions, the compiler could never know that p had reached the end of its lifetime.

What if the user implements their own malloc and realloc (this is commonly done) — how would this affect undefined behavior?

Nick, I did not mean to say that == produces an undefined behavior in your program.

I said meant that the first undefined behavior was in p==q, which I still think it is (first use of p after it has become indeterminate).

“indeterminate” in the C99 standard does *not* mean that “the comparison could be true or false”, it means that accessing p for the comparison is *undefined behavior*. The same vocabulary is used for uninitialized locals (that clearly produce, on access, undefined behavior and not unspecified results; see http://markshroyer.com/2012/06/c-both-true-and-false/ ).

I am somewhat confused by the aliasing comments for Nick’s example. Didn’t Clang just assume that p and q were aliases on account of the fact that the allocation sizes are the same, and so elide the check if (p == q)?

Nick Lewycky, can you post the assembly, OS version, and libc version?

It is not that the compiler assumes that p==q evaluates to 1. No, that condition is compiled to actual code, whereas *p and *q as arguments of printf() are constant-folded into what should be their values if p and q did not alias (which according the compiler they can’t do in a defined way).

regehr: The statement “same allocation size” implies “the pointers are aliases” is, of course, not true. I was just guessing as to what might be going on.

I see the same thing on my Mac (Apple clang version 3.1 (tags/Apple/clang-318.0.61) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin11.4.0). Trimmed to the important part, the assembly it outputs is:

is likewise invoking undefined behavior; an optimizing compiler might choose to make the ‘if’ condition always false, and never execute the body of the if statement (or even include it in object code). No, I’m not aware of any compiler that actually does this.