How to Write a C/C++ Compiler That Respects Volatile

The volatile type qualifier in C/C++ means roughly that accesses to the qualified object happen on the actual machine as they do in the abstract machine. I’ve written about volatile pretty extensively, so won’t repeat myself.

An interesting problem with volatile is that in practice, compilers fail to respect it: they add, remove, and reorder accesses in ways that break the rules. This happens because the rules for translating accesses to volatile-qualified objects are very different from the rules for accessing regular C variables: the compiler has nearly complete freedom to add, remove, and reorder non-volatile variable accesses so long as this doesn’t change the program’s externally observable behavior. Thus, many optimization passes require special cases like this:

if (!var->is_volatile()) transform_code();

The problem is that every optimization developer must add these extra safety checks every time — any omission is likely to break the properties that volatile is intended to preserve.

A few years ago Eric Eide and I observed that the rules for accessing volatile objects are very similar to the rules for manipulating function calls. In other words, when the compiler lacks any special knowledge about a called function, it must not add, remove, or reorder function calls. This lead to the idea that if compiler writers would simply model volatile accesses as function calls, all of those special cases in the optimization passes would go away. We tested this idea by writing a source-to-source transformer for C code that turned accesses to volatiles into calls to automatically generated helper functions. In other words, if x is defined as “volatile int x;” then this code:

y=x;

becomes:

y=__volatile_read_int(&x);

Our idea mostly worked. What I mean is that many, but not all, problems in miscompilation of accesses to volatiles went away after transforming programs. Our hypothesis was that when wrapper functions didn’t work, it was always because the compiler was performing a regular miscompilation (i.e., generating the wrong code in a way that has nothing to do with volatile). But we couldn’t back this up since we lacked a correct compiler and we also didn’t have time to manually inspect thousands of failed test cases.

So far this is all old news, but there has been a very nice new development in volatile-land. As of recently, CompCert implements a proved volatile semantics like this:

Accesses to volatile-qualified objects are turned into function calls in the frontend.

In the backend, the function calls are turned into inline code that performs the accesses.

This is basically the strategy that Eric and I came up with, but with a nice improvement: much of the overhead associated with actual function calls is avoided due to the late inline substitution. The overhead of function calls — particularly in terms of code size — would be significant for small embedded applications that consist mainly of accesses to device registers. Some overhead will likely remain due to the calling convention and because CompCert must pessimistically assume that a helper function has updated any global variables that happen to be cached in registers. Hopefully the CompCert folks will quantify the overheads of various alternatives in a paper sometime.

We looked for volatile bugs in CompCert and found a few: they were in, for example, the unproved frontend code that expands C-level struct assignments into Clight. After Xavier fixed these bugs (I think there were two of this ilk) we can no longer find volatile bugs in CompCert. Now we finally come to the point of this post:

Out of the dozen-odd compilers we have tested, there is only one — CompCert — that reliably respects C’s volatile qualifier.

This is an impressive result.

Update from March 1 2011:

My description of CompCert is a bit out of date. Xavier Leroy explains:

One little correction, though: the handling of volatiles you describe is that of CompCert 1.7.

In the latest release CompCert 1.8, I improved the generated code by, in essence, inlining the _volatile_read and _volatile_write functions after optimizations are done, but before register allocation. (In reality, it’s done a bit differently, through a notion of “inlinable builtin functions” of which the volatile operations is an instance.)

This way, the generated code isn’t constrained by the calling conventions: the compiler knows that the caller-save registers are not destroyed by the _volatile_* functions, and that these “functions” can take their arguments in any register, not just those dictated by the calling conventions.

This sounds like exactly the right solution: not only does it give us correct code and optimization passes that are free of volatile-related clutter, but the performance and size of the generated code should be very good.

3 thoughts on “How to Write a C/C++ Compiler That Respects Volatile”

Every time you write about volatiles, I think about the following (which is related to memory model issues in multithreaded C/C++/Java and in multicore processors).

In practice, embedded C programs are subject to additional memory ordering constraint (which volatile can’t easily give you). They want to write data to a (non-volatile) array, execute a ‘barrier’ of some form and then perform a volatile write to a DMA controller and that all writes to the array will complete before the volatile write.

I think most compilers provide a barrier that is claimed to prevent the compiler from moving memory accesses past it.
Recommended practice for gcc is to insert a volatile ‘nop’ instruction which should prevent movement of memory accesses. Embedded compilers like armcc have an explicit barrier function documented in the manual.

After all your work on volatiles, the question in my mind is whether the compilers actually implement these barriers correctly?

The answer is critical for both low level embedded C programs and for multithreaded C programs.

[An almost identical question is whether inline assembly code marked volatile is ever reordered wrt memory accesses. In the scenario above, it’s not enough for the compiler to do the right thing, the processor also has to do the right thing so the programmer has to insert instructions to drain the write buffer or flush part of the cache.]

We can write a nice tester for properties that imply a nice invariant. Volatile has a very strong invariant so that job was pretty easy. The problem with the new C++ memory model is that it’s so complicated that there are many possible invariants and it’s not exactly clear what to test. Certainly that thing should be tested, though. Do you guys have an implementation yet? To do differential testing I’ll need two or more implementations.

I think that things like compiler and hardware barriers provide fairly simple and strong invariants, so probably this testing isn’t too difficult.

I actually have a new round of volatile testing to do where we check if the program is ever “surprised” that a volatile location returns a value other than the one last stored. So far we have never tested for this, only for cases where the compiler adds/removes a volatile load/store. The tester for this — Yang Chen already implemented it — is a fairly simple hack using PIN that changes the value loaded in a pseudorandom way. Of course doing this in a simulator would be even easier.