This output means that Coq---via OCaml---is executing a number of C's undefined behaviors before it even asks for any input from the user. The problem with undefined behaviors is that, according to the standard, they destroy the meaning of the program that executes them. We do not wish for Coq's meaning to be destroyed because a proof in Coq is widely considered to be a reliable indicator that the result is correct. (Also, see the update at the bottom of this post: the standalone verifier coqchk has the same problems.)

In principle all undefined behaviors are equally bad. In practice, some of them might only land us in purgatory ("Pointers that do not point into, or just beyond, the same array object are subtracted") whereas others (store to out-of-bounds array element) place us squarely into the ninth circle. To which category do the undefined behaviors above belong? To the best of my knowledge, the left shift problems are, at the moment, benign. What I mean by "benign" is that all compilers that I know of will take a technically undefined construct such as 0xffff << 16 (on a machine with 32-bit, two's complement integers) and compile it as if the arguments were unsigned, giving the intuitive result of 0xffff0000. This compilation strategy could change.

If we forget about the shift errors, we are still left with three signed integer overflows:

-9223372036854775807 + -2

-9223372036854775808 - 1

31898978766825602 * 65599

Modern C compilers are known to exploit the undefinedness of such operations in order to generate more efficient code. Here's an example where two commonly-used C compilers evaluate (INT_MAX+1) > INT_MAX to both 0 and 1 in the same program, at the same optimization level:

Realistically, what kinds of consequences can we expect from signed integer overflows that do not map to trapping instructions, using today's compilers? Basically, as the example shows, we can expect inconsistent results from the overflowing operations. In principle a compiler could do something worse than this---such as deleting the entire statement or function which contains the undefined behavior---and I would not be too surprised to see that kind of thing happen in the future. But for now, we might reasonably hope that the effects are limited to returning wrong answers.

Now we come to the important question: Is Coq's validity threatened? The short answer is that this seems unlikely. The long answer requires a bit more work.

What do I mean by "Coq's validity threatened"? Of course I am not referring to its mathematical foundations. Rather, I am asking about the possibility that the instance of Coq that is running on my machine (and similar ones running on similar machines) may produce an incorrect result because a C compiler was given a licence to kill.

Let's look at the chain of events that would be required for Coq to return an invalid result such as claiming that a theorem was proved when in fact it was not. First, it is not necessarily the case that the compiler is bright enough to exploit the undefined integer overflow and return an unexpected result. Second, an incorrect result, once produced, might never escape from the OCaml runtime. For example, maybe the overflow is in a computation that decides whether it's time to run the garbage collector, and the worst that can happen is that the error causes us to spend too much time in the GC. On the other hand, a wrong value may in fact propagate into Coq.

Let's look at the overflows in a bit more detail. The first, at line 1016 of interp.c, is in the implementation of the OFFSETINT instruction which adds a value to the accumulator. The second overflow, at line 674 of ints.c, is in a short function called caml_nativeint_sub(), which I assume performs subtraction of two machine-native integers. The third overflow, at line 949 of interp.c, is in the implementation of the MULINT instruction which, as far as I can tell, pops a value from the stack and multiplies it by the accumulator. All three of these overflows fit into a pattern I've seen many times where a higher-level language implementation uses C's signed math operators with insufficient precondition checks. In general, the intent is not to expose C's undefined semantics to programs in the higher-level language, but of course that is what happens sometimes. If any of these overflows is exploited by the compiler and returns strange results, OCaml will indeed misbehave. Thus, at present the correctness of OCaml programs such as Coq relies on a couple of things. First, we're hoping the compiler is not smart enough to exploit these overflows. Second, if a compiler is smart enough, we're counting on the fact that the resulting errors will be egregious enough that they will be noticed. That is, they won't just break Coq in subtle ways.

If these bugs are in the OCaml implementation, why am I picking on Coq? Because it makes a good example. The Coq developers have gone to significant trouble to create a system with a small trusted computing base, in order to approximate as nearly as possible the ideal of an unassailable system for producing and checking mathematical proofs. This example shows how even such a careful design might go awry if its chain of assumptions contains a weak link.

These results were obtained using OCaml 3.12.1 on a 64-bit Linux machine. The behavior of the latest OCaml from SVN is basically the same. In fact, running OCaml's test suite reveals signed overflows at 17 distinct locations, not just the three shown above; additionally, there is a division by zero and a shift by -1 bit positions. My opinion is that a solid audit of OCaml's integer operations should be performed, followed by some hopefully minor tweaking to avoid these undefined behaviors, followed by aggressive stress testing of the implementation when compiled with Clang's integer overflow checks.

UPDATE: psnively on twitter suggested something that I should have done originally, which is to look at Coqchk, the standalone proof verifier. Here's the output:

{ 11 } Comments

This particular example came up because Will Dietz built an entire Debian system with integer overflow checking turned on. I was looking at the results and saw some overflows in Coq, and decided this was worth spending some time looking at in more detail.

Your observation about unintended semantic leakage between the interpretation target and host language (C) is very accurate; I have seen it many times in various forms.
Here is a particularly fun example from the implementation of a multiplication instruction in a processor simulator. Two unsigned 16 bit numbers are multiplied, yielding a 32 bit product which is then stored in a 64-bit register:

This code looks “safe” – only unsigned variables – but the C compiler emitted an instruction to sign-extend the 32-bit product to 64 bits, which is completely acceptable when you think about it but did puzzle us for a little while.

Coq relying on the full OCaml runtime I don’t mind. However, I am surprised (I’m naive about theorem proving stuff, I guess) that the proof _checkers_ aren’t small programs in pure C, compiled using no optimization. There are some such checkers, right? I recall Appel and some folks did one for proof-carrying code…

Alex, I don’t pretend to have a serious understanding of the issues here, but I think it’s clear that a smaller TCB for proofs is better, other things being equal. The Coq folks provide a bit of discussion of that issue here:

Right. I’m willing to grant that the OS is unavoidable, though in practice you could make minimal system calls, etc. But OCaml is a pretty big pile of code. Of course, there would be a certain irony in compiling the minimal C proof checker in CompCert, but I’m willing to live with it. Circular reasoning makes the world go around.

More seriously: thanks for this interesting piece of feedback. Here is my analysis of the alarms you mention:

– The “division by zero” is really a floating-point division 1.0 / 0.0 that is perfectly defined in IEEE-754 as returning +infinity. (And that’s exactly why OCaml evaluates this expression: to get a +infinity value.) Since the C standard says *nothing* about floating-point except how to map it to IEEE-754 if the C implementor so wishes, it is very very reasonable to assume IEEE-754 compliant float arithmetic in all your C programs. I believe your tool should *not* raise alarms for floating-point arithmetic unless specifically requested. Otherwise you’ll get laughed out of the room by floating-point experts. (I have a few in my surroundings and I speak from experience here.)

– The shift by a negative amount is bad indeed, but it’s a case of garbage in, garbage out. Just like C, OCaml states that shifts by amounts = word-size are undefined, and therefore maps its shifts to C shifts. What you observed is that one of the programs from the OCaml test suite contains a bad shift, and we will definitely investigate and fix, but not that the OCaml runtime system contains a bad shift.

– The other alarms are all of the “signed integer arithmetic is expected to work as in hardware” kind. You have a point that it could be more prudent to use unsigned integer types more frequently in the OCaml runtime, esp. the bytecode interpreter, but so far we haven’t run into any issue related to overzealous optimizations over signed integer arithmetic of the kind you describe. Indeed, the reason why you see so many signed overflows when running the OCaml test suite is precisely that it is testing proper (in OCaml’s sense, not C sense) operation of integer arithmetic…

Two more general comments to finish. The first is that, to my relief, the overflow issues you found are mostly within the bytecode interpreter, and to a lesser extent some primitive OCaml functions, but not, say, in the GC and memory manager, where they would correspond to real algorithmic errors indeed. Most OCaml programs these days are compiled to native code, bypassing the bytecode interpreter and mapping OCaml’s arithmetic straight to the hardware. The bytecode interpreter is a strange beast that has the difficult task of implementing Java-like integer arithmetic in terms of the rather uncooperative C language…

The second general comment is that there is no way the OCaml runtime system can be written in portable, standard-compliant C. It’s not alone in this category: I would go so far as to say that no software that is appropriate to write in C (e.g. operating systems, low-level libraries like the C standard library itself, memory managers, virtual machines, etc) can be written in standard-compliant C. In other words, interesting C software is rarely if ever coded against the C standard, but rather against some kind of folklore “system C” dialect that is much more defined. In this respect, C compilers that aggressively optimize assuming strictly-conformant C programs are more harm than good. Basically, they deliberately miscompile valuable “system C” software, and I (and many others, e.g. the Linux kernel developers) don’t find this particularly productive.

Re “system C”, I agree with your assessment of modern C; I wish the Committee would work more on standardizing existing practice and less on coming up with new loopholes for compiler vendors to silently insert backwards-incompatibilities. Let “int” be a 32-bit two’s-complement type, already! (You 16-bit micro guys can use “short”. It’s what it’s there for.) BUT, given the Committee we have, I think it’s great that John is out there catching the non-standard stuff before it causes problems down the road.
If you were implying that John is creating a moral hazard that will just encourage the Committee to make even *more* silly rules in the next version… well, I weakly disagree.

Regarding the FP divide by zero alarm, I tend to agree with you. We will revisit the decision to include that check. Also, I probably advised you poorly when I suggested the -fsanitize=undefined compiler option. A better choice (and one that would not produce the FP error) is:

“-fsanitize=integer
-fno-sanitize=unsigned-integer-overflow”

Regarding the signed overflows, I think the main danger would come from a case where (via outlining and inlining) the compiler manages to create a specialized, highly-optimized version of the code containing the overflow. When some constants are available to play with (as in my example C program in this post) the optimizer may end up doing something legal but unexpected.

I agree with you and with the Linux people about the undesirability of deviating from “system C” as it has commonly been understood. The implicit prioritization of performance over correctness is frightening.

Finally, I would be interested to contribute towards any ongoing efforts to stress-test the OCaml implementation. The last few percent of optimizer and GC bugs can be very hard to flush out. It seems that due to the two different implementations, OCaml is well suited to differential testing where the compiler serves as an oracle for the interpreter and vice versa.

Hi Arthur, these checks (and even more so, the command-line options that control them) are somewhat in a state of flux. We hope to get it all nailed down before the 3.3 release. To make sanitized code link, you must build Compiler-RT, a set of optional runtime libraries–the main Clang build instructions say how to do this.

Re. two’s complement integers, I strongly agree, though I have heard stories of codes that speed up by 30% or more when the compiler can exploit undefined overflow. Apparently the win occurs when the compiler does not need to consider the case where a loop variable suddenly becomes negative.

Have been tinkering about building some rather low level tools and in the
manner of Wilkes above, was feeling wiped out and disheartened tracking
data and control-flow through piles of LLVM bit code, wondering why I
don’t just return to the warm and cozy confines of System F.

And then I just saw your last two super posts and they’ve left me feeling
very inspired, so thanks!