When the program terminated without any perceptible delay, I figured there was a bug, but nope: the code is good. It turns out that both GCC and Clang optimize the program into effectively this:

int main (void)
{
printf ("checked 4294967296 values\n");
return 0;
}

The surprise (for me) was that at -O1 — which traditionally does not enable interprocedural optimizations or aggressive loop transformations — both compilers looked inside the function foo() closely enough to figure out that it is a nop, and also that both compilers were able to predict that a not-traditionally-structured loop executes 2^32 times. I do so many posts about compiler bugs here that I figured a bit of antidote would be nice.

11 thoughts on “Counting to 4 Billion Really Fast”

I’m guessing inlining+DCE is now common and “basic”, but the compile-time evaluation of ‘checked’ is cool! Is there a well-established dataflow algorithm for that?

Did the code work properly once, I assume, you changed the last line of foo to something like ” return x1 == x2″, then asserted on that in the main loop?

BTW, this is a much cleaner way of doing fast loops than I had seen before : calloc() an array such that it ends at a page boundary, then set permissions on the next page such that an access will cause an exception, then finish the loop in the handler!

Rewriting standard library functions seems a dubious pastime to me beyond builtins and obvious arithmetical optimizations. The assumptions compilers can make about them are limited. I wonder if rewriting printf() to puts() would actually pay off — nominally puts() is simpler since it skips the formatting, but you’d expect the I/O to be the bottleneck anyway. And the compiler needs to check that the string contains no formatting specifiers and the return value is not used (since the return value of puts() cannot be used to emulate the return value of printf()). All that for a library that for all you know might implement puts() by calling printf()…

Jeroen, I’m pretty sure that previous versions of GCC did the puts() optimization. Perhaps they backed off due to problems like the ones you mention. The real benefit of this replacement would be on embedded systems where (assuming the libraries are designed well or the linker is pretty smart) there would be some chance of not even linking printf() into the executable.

How many codes have you seen that checked the return value of printf()? I’m not sure I’ve ever seen it done “for real”.