Parallel

volatile vs. volatile

By Herb Sutter, January 08, 2009

A tale of two similar but different tools

Case 2: Semantic-Free Variables for "Unusual" Memory Semantics

The second requirement is to deal with "unusual" memory that goes beyond a given language's memory model, where the compiler must assume that the variable can change value at any time and/or that reads and writes may have unknowable semantics and consequences. Classic examples include:

Hardware registers, part 1: Asynchronous changes. For example, consider a memory location M on a custom board that is connected to an instrument that directly writes to M. Unlike ordinary memory that is only modified by the program itself, the value stored in M can change at any time even if no program thread writes to it; therefore, the compiler cannot make any assumptions that the value will be stable.

Hardware registers, part 2: Semantics. For example, consider a memory location M on a custom board where writing to that location always automatically increments by one. Unlike an ordinary RAM memory location, the compiler can't even assume that performing a write to M and then immediately following it with a read from M will necessarily read the same value that was written.

Memory having more than one address. If a given memory location is accessible using two different addresses A1 and A2, a compiler or processor may not be able to know that writing to location A1 can change the value at location A2. Any optimization that assumes a write to A1 does not change the value of A2 will break the program, and must be prevented.

Variables in such memory locations are unoptimizable variables, because the compiler can't safely make any assumptions about them at all. Put another way, the compiler needs to be told that such a variable doesn't participate in the normal type system, even if it appears to have a given type. For example, if memory location M or A1/A2 in the aforementioned examples are typed as an "int" in the program, what does that really mean? It means at most that it has the size and layout of an int, but it cannot mean that it behaves like an int -- after all, ints don't autoincrement themselves when you write to them, or mysteriously change their values when you write to what looks like a different variable at some other address.

We need a way to turn off all optimizations on their reads and writes. ISO C and C++ have a portable, standard way to tell the compiler that this is such a special variable that it must not optimize: volatile.

Java and .NET have no comparable concept. Managed environments are supposed to know the full semantics of the program they execute, after all, so it's not surprising that they don't support memory with "unknowable" semantics. But both Java and .NET do provide escape hatches to leave the managed environment and invoke native code: Java provides the Java Native Interface (JNI) and .NET provides Platform Invoke (P/Invoke). The JNI specification [5] is silent about volatile, however, and doesn't mention either Java volatile or C/C++ volatile at all; similarly, the P/Invoke documentation doesn't mention interactions with .NET volatile or C/C++ volatile. So to correctly access an unoptimizable memory location in either Java or .NET, you must write C/C++ functions that use C/C++ volatile to do the needed work on behalf of their managed calling code, make sure they entirely encapsulate and hide the volatile memory (i.e., neither take nor return anything volatile), and call those functions through JNI and P/Invoke.

Unoptimizable Variables and (Not) Optimization

All reads and writes of unoptimizable variables on a given thread must execute exactly as written; no optimizations are allowed at all, because the compiler can't know the full semantics of the variable and when and how its value can change. This is a stronger statement than for ordered atomics, which only need to execute in source code order.

Consider again the two transformations we considered before, but this time replacing the ordered atomic variable a with the unoptimizable (C/C++ volatile) variable v:

v = 1; // A
v = 2; // B

Is it legal to transform this as follows to remove the apparently redundant write in line A?

The answer is no, because the compiler cannot possibly know that eliminating line A's write to v won't change the meaning of the program. For example, v could be a location accessed by custom hardware that expects to see the value 1 before the value 2 and won't work correctly otherwise.

Similarly, if v is an unoptimizable variable and local is an unshared local variable, it is not legal to transform

to eliminate the read from v. For example, v could be a hardware address that automatically increments itself every time it's written, so that writing 1 will yield the value 2 on the next read.

Second, what about nearby ordinary reads and writes -- can those still be reordered around unoptimizable reads and writes? Today, there is no practical portable answer because C/C++ compiler implementations vary widely and aren't likely to converge anytime soon. For example, one interpretation of the C++ Standard holds that ordinary reads can move freely in either direction across a C/C++ volatile read or write, but that an ordinary write cannot move at all across a C/C++ volatile read or write -- which would make C/C++ volatile both less restrictive and more restrictive, respectively, than an ordered atomic. Some compiler vendors support that interpretation; others don't optimize across volatile reads or writes at all; and still others have their own preferred semantics.

Summary

To safely write lock-free code that communicates between threads without using locks, prefer to use ordered atomic variables: Java/.NET volatile, C++0x atomic<T>, and C-compatible atomic_T.

To safely communicate with special hardware or other memory that has unusual semantics, use unoptimizable variables: ISO C/C++ volatile. Remember that reads and writes of these variables are not necessarily atomic, however.

Finally, to express a variable that both has unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, only the ISO C++0x draft Standard provides a direct way to spell it: volatile atomic<T>.

[4] A common objection at this point is: "In the original code, it was possible for another thread to see the intermediate value, but that is impossible in the transformed code. Isn't that a change in observable behavior? " The answer is, "No, " because the program was never guaranteed to ever actually interleave just in time to see that value; it was already a legal outcome for this thread to just always run so fast that the interleaving happened to never manifest. Again, what this optimization does is reduce the set of possible executions, which is always legal.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!