Undefined Behavior Consequences Contest Winners

The contest that I posted the other day received some very nice entries. I decided to pick multiple winners since the best entries illustrate consequences of several kinds of undefined behavior.

First, here’s the runner up, submitted by Octoploid:

int main (void) {
int x = 1;
while (x) x <<= 1;
return x;
}

This program is undefined by C99, where signed left shifts must not push a 1 bit into or past the sign bit. Recent GCC versions such as r189449 for x86-64 on Linux compile this program into an infinite loop at -O2 and higher. This entry is great because in general, compiler developers have been very reluctant to start enforcing the very strict rules for signed left shift in C99. On the other hand, GCC still performs this optimization when asked to compile in C89 mode, where I don’t think the behavior is undefined. Based on that observation, I filed a GCC bug report. One of the main GCC developers, Joseph Myers, replied with this:

Shifts in GCC are supposed to be defined as long as the shift amount is in range, independent of the LHS, so it should not be folding like that. (Although we document in implement-c.texi that this is subject to change for signed left shift, I don’t think changing it would be a particularly good idea.)

I think it is great to see compiler developers taking this kind of a stand against exploiting certain undefined behaviors. Anyway, although the matter has not yet been completely resolved, and the optimization is perfectly valid in C99, it seems that the developers will back out this transformation. So, while I find this example to be very interesting, it does not seem like a contest winner.

Winner #1

This function forms a bit of an argument against the very terse style of for-loop afforded by C which (in my opinion) makes the undefined behavior harder to see than it might otherwise have been. The undefined operation here is reading a[32]. Recent versions of GCC, apparently as a consequence, decide that the loop exit test can be removed, giving an infinite loop that triggers a segfault when i becomes large enough.

To see the undefined behavior, you need to read the fine print in the C99 standard, which says:

The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size.

In other words, it is an error to use a pointer after it has been passed as an argument to realloc. This particular bit of language does not appear in the ANSI C standard and the situation there is more ambiguous. However, reading between the lines in A.6.2, I believe we can infer that this code was intended to be undefined in C89 as well.

The man pages for realloc() that I looked at do not mention or imply this caveat. If a major compiler is going to exploit this behavior, the documentation must be updated.

Conclusion

I chose these winners because in each case there is an element of surprise. In other words, I would not have guessed that the compiler would exploit these particular undefined behaviors, even after I understood the program or program fragment completely. An entry submitted by Pascal Cuoq, which is otherwise very interesting, fails this test—I would have expected a modern C compiler to exploit those undefined behaviors.

If anyone is curious, the for-loop in Winner #1 uses the comma operator, which can be used to execute multiple statements sequentially in places where only one statement is accepted:
return i++, 0; // increments i and returns 0

Kevin, you’ve got the right line in the man page, however the consequence you’ve described is wrong. The p == q check is not optimized away, instead the compiler treats ‘p’ and ‘q’ as legitimately pointing to different memory and then folding the loads against the stores. Here’s the machine code on x86-64 linux, showing the comparison in tact:

@4,5,6: John, you really need to stop posting Nick’s example without at least two paragraphs of explanatory English text. 99% of the people who read these blog posts don’t understand Nick’s example, and 10% of them feel the need to comment about it, and it’s really annoying to the small number of us who read the big thread from last time you posted it.

@Kevin: You’re not alone. Even I looked at that code the first time and said, “Well, what’s surprising about that?” You have to actually type it in and run it before the interesting feature becomes apparent. (And you must type it by hand! Cutting and pasting is not allowed!)

I just changed the language. The language is by definition what my compiler does.

God says…
C:\Text\BIBLE.TXT

on the earth; and what will I, if it be
already kindled? 12:50 But I have a baptism to be baptized with; and
how am I straitened till it be accomplished! 12:51 Suppose ye that I
am come to give peace on earth? I tell you, Nay; but rather division:
12:52 For from henceforth there shall be five in one house divided,
three against two, and two against three.

12:53 The father shall be divided against the son, and the son against
the father; the mother against the daughter, and the daughter against

Hi Arthur, you’re right that I could probably have cut down on the confusion by including more explanation. On the other hand, this topic is tricky enough that a bit of mystery is OK, I think. People need to move themselves towards enlightenment, it cannot be done for them. For additional examples of this phenomenon see the Reddit thread:

@ Richard Zetterberg
It’s not the fact that there is a undef behavior which is interesting but the result.
Your code will probably work perfectly and print 1 even if the behavior is undefined.
But #2 produces a very interesting behavior, because two pointers that both point to the same valid memory location gives a different value once dereferenced. Which is valid regarding to the C standard because p point to a deallocated object even if it actually points to a valid memory location.

@14 (John): Well, you know I don’t object to puzzles. But I think in this case the “puzzle” too easily admits of a misreading that makes people *think* the solution is trivial and/or stupid. Sometimes they’ll still be meta-puzzled, as in Richard #10’s case: “This puzzle is trivial. Why are some of you acting like it’s interesting?” But that’s different from “This puzzle is interesting. What’s the solution?”

On the Internet, the null hypothesis is always — and rightly — that if something looks stupid or trivial at first glance, it probably *is* stupid or trivial. You have to actively do something to falsify the null hypothesis, or most people (myself included!) aren’t going to bother looking for alternative hypotheses.

Your “UBCC” is sort of like the Underhanded C Contest, in that the goal is to write innocent-looking code that actually misbehaves… but a reader of the UCC tries to figure out how the code leads to a *particular* misbehavior, whereas the whole point of your UBCC was that one dubious bit is enough to make the program do *anything*, so once the reader has found what looks like the dubious bit, it means they’ve won and can stop thinking.

Hi Kristina, I sent them each a $25 gift certificate from Amazon. This seems kind of impersonal. However, for an earlier blog contest I sent the winner a caramel chocolate apple from a local candy store that I love but the Fedexing ended up being like 2x the cost of the apple, so that didn’t really work out!

I am also interested in the underhanded C contest, whose results are more valuable to me professionally than that of the IOCCC.

John, what part of a UCCC revival sounds difficult to you? How about starting with a simple blog challenge just like the previous one? I can help with the judging (but if you don’t need me I might participate too).