The volatile keyword is highly subjective to the language and the platform it is implemented on. While Java provides a consistent behaviour of volatile across all architectures, this is not the case for languages that are directly compiled into the native machine platform, such as in the case of C/C++. Let's try to understand why this is the case.

Let be member of a set of program actions , and be function that applies the volatility requirement to an action, where the subscript denotes the -th iteration in which a volatile action is applied, and be the precedes operator, which is explained previously. For all program actions, the following rules holds:

Rule 1 says that all volatile functions enforce a total order, where the function always precedes , where rule 2 says that if an action precedes a volatile function on an action on the -th iteration, then the action must necessarily precede all subsequent volatile functions applied to .

This is a very strong memory requirement in Java, in fact it is much stronger than compared to C/C++. The C/C++ language specification has no such restriction on memory ordering, and leaves it to the compiler implementation to decide how non-volatile actions are ordered around volatile actions.

Let's consider how the these rules affect program execution with a simple code sample:

int a = 0;int b = 0;volatile int count = 0;

a = 1;count = 1;b = 2;count = 2;

In C/C++, the volatile keyword only guarantees that the count variable cannot be reordered against each other, ie. if count == 2, then count = 1 must necessarily precede it. However, there is neither a guarantee that a == 1, nor that b == 2.

In Java, given the stronger guarantee defined above, then if count == 1, then the assertion a == 1 must be true. Similarly, if count == 2 then assertion that a == 1 && b == 2 must be true. This is what is means by the strict memory guarantee that Java offers that C/C++ does not.

However this does not mean that C/C++ will not behave the same way Java does. Whether if it does so depends on (1) whether if the compiler performs any code reordering that may be in a surprising order, but legit order, and (2) whether if the underlying machine architecture upholds the same strict memory order, provided that the compiler does not perform any surprising code reordering.

For instance, compiling code on gcc with -O0 set on all x86 platforms will conform to (and is stricter than) Java's memory model, but other architectures such as the PowerPC, Alpha, Itanium all uphold a weaker memory model which can exhibit surprising program behaviours that a programmer may not expect. Caveat Lector!

Anyhow, if you are interested in more memory model consistency rules, you might want to watch Intel's explanation of the x86 memory model where the nuances of memory ordering is explained in good detail. Enjoy!

When it comes to critising programming languages, Java seems to take the top spot for being the baddest. This is widely seen on the InterTubes, like here, and here.

But does a 'bad' language mean that it'll die a relatively quick death?

To find out, let's take a look at the etymology of an older computer language, C. C has been a systems programming language that has been around the last 40 years. The last revision to the C standard was 10 years ago, and even without moving with the times, the language is still going strong - last I heard, it is still the language of choice for 40% of Open Source developers.

Does that mean that people have stopped complaining about pointers, easy-to-write buffer overflow errors, memory leaks, having to declare all variables up front before code, etc, etc, and other quirks about the language?

I suspect not. So why still C?

Simple - it works. And I suspect the same can be said with Java.

Furthermore, it's silly to argue about Java's merits and drawbacks, because that's really missing the forest for the trees. While the most visible part about Java is the undoubtedly the language, but the true technology of Java is not in the language, but the virtual machine itself. The JVM as it stands today, is a fast, abstract machine that you can plug any languages into, and is able to operate at speeds comparable to natively compiled binaries.

Like most programmers, we do enjoy bitching about peculiarities of a language once in a while, but for people who hate Java with a passion, maybe you need to get your head checked. A language is merely a medium of expression; and a computer language is one specifically used to express program behaviour. Normally, the choices are either to learn it well and avoid the pitfalls, or find a better medium of expression.

So seriously, if you don't like Java, there is a cure. Stop. Using. It.

I'll let you in on a secret to programming languages; there are only two types of languages in this world - languages that people complain about, and languages that nobody uses (Stroustrup said so). So in an obtuse manner, the vast majority of people who criticises about Java are only reaffirming its popularity.

Which is precisely why I can't see Java going the way of dinosaurs. Raving incessantly against it, ironically only helps boost its reputation, albeit in a weird, backhanded-kind of way.

... Or why you are not guaranteed a coherent value when you are reading longs and doubles in Java.

How is that possible?

This mainly due to a language compromise to deal with 'legacy' 32-bit CPU systems, where a 64-bit value memory action is divided into two 32-bit memory actions, which is mainly done for historical efficiency reasons (Refer to 17.4 of the Java Language Specification for an explicit answer to why). But note that this is not a problem if you are using native 64-bit architectures.

So how does this translate to a possible problem?

Lets try to simplify this problem by illustrating the same scenario as having a 2-bit value memory action being split into two 1-bit memory action. So assuming that we have a memory location m which looks something like this:

From the diagram m is logically separated into to two 1-bit locations m[0] and m[1]. Let's assume that m starts off with 00b as its initial value. Then m looks like this:

Now suppose that we have 2 CPUs concurrently executing the following code:

CPU1

CPU2

R(m)

W(m=11b)

R(m) is a function that reads the value of m, while W(m) is a function that sets the value to m. In the case of our example, CPU1 is reading the value of m while CPU2 is writing the binary value 11b in to m.

Logically if atomicity is enforced, then the only possible value that can be observed at any given time is either 00b or 11b. But in the case of longs and doubles, the specification explicitly mentions that no guarantees of atomicity is enforced.

Hence it is possible to timeslice memory updates in the following order:

As CPU2's execution is timesliced between the execution of CPU1, this results in R(m) return the value of 01b, which is a transient value, but it is not a proper value to be observed.

It can be said that that CPU1 happens to be reading at the right place but at a wrong time. The volatile keyword in Java will resolve this problem, by virtue of the fact that it does not allow reordering of read and write actions, and hence by implictly disallowing read in-between write actions and vice-versa.

This has the effect of making read and write actions appear atomic, not that the read and write actions are implicitly atomic themselves. For a better understanding of what I mean by atomic, see my explanation on the difference between Atomic and Volatile.

I initially thought I could get away with it by linking Latex images for my equations directly from Google Docs to Blogger, but it seems that the image URLs are not permalinks, causing the images in my blog posts to be missing after a while.

Wordpress has a plugin to support this directly, but there's a nary of solution for Blogger.

Still trying to look for possible workarounds to this problem. Hopefully it's not one that involves manually copying and linking the images directly. :(

There seems to be a tendency for people to confuse an atomic action with a volatile memory access. This is quite natural, since they exhibit similar properties and makes it hard for people to distinguish between the two.

So what is the common property between an atomic action and a volatile memory? They both exhibit the property of total ordering. Let's assume for where is the set of possible actions a program can execute, and denotes that an action precedes action , then the property of total order means:

Literally, the set of rules mean (1) that for any action that precedes and any action that precedes , then the action must necessarily be the same action as . (2) If precedes , and precedes , then must precede . (3) must precede or must precede . These are just some mathematical (or convoluted) formalisms of saying any action is guaranteed to be in sequential order.

But that's where the similarity ends.

An atomic action is literally what it means: an action that is indivisible. Volatile memory only provides a guarantee of sequential order, but offers no guarantee that an interleaving of actions cannot occur. For an illustration, lets take a look at the following code:

a = 0;a++;

Assume that 'a' is a volatile variable. From the code, we naturally think that the increment operator (++) will atomically increment the value of 'a' to 1, but this assumption is slightly faulty:

[a] = 0;r0 = [a];r0 = r0 + 1;[a] = r0;

The square brackets between [a] denotes a value gotten from memory, while r0 represents a register in which the addition operation actually occurs. An increment operation is actually a composite action, composed of a read, then a write action. (By the way, it is implied that register actions are atomic, despite being a composite action. This is because an operation within a register cannot be directly observed).

So increment is a composite action, compared to being an atomic action, what's the harm? Lets go through the same example, but with 2 concurrent processors executing simultaneously:

CPU1

CPU2

a++;

a++;

Lets assume that a is initially 0 to start. Given that iff the actions from CPU1 and CPU2 are both volatile and atomic, then the only possible transition value that can be seen after execution can only be 2.

However, if the actions from CPU1 and CPU2 are only volatile, then the actions can translate to this:

CPU1

CPU2

r0 = a;

r1 = a;

r0 = r0 + 1;

r1= r1 + 1

a = r0;

a = r1

For all the possible decompositions of into uniprocessor- interleavings, one can see that it is possible that both registers r0 and r1 both read a as 0 simultaneously which causes the resultant possible values to be either 1 or 2. Clearly seeing 1 is wrong, since if the operation is atomic, then 1 cannot be a legal value at the end of the execution.

Conversely, a volatile variable can be deemed atomic. Consider the example below:

CPU1

CPU2

a=1;

a=2;

In this case, the only possible set of terminal values are 1 or 2, which is exactly the same as you would get for an atomic action. Therefore, a volatile memory access is atomic iff you perform only a read or write action, which are atomic in nature. But a volatile variable is no guarantee of atomicity if a composite action is executed.

Hence the label atomic actions is a slight misnomer, because it doesn't imply that the action is atomic, but rather it means that it guarantees a composite action is executed as a single atomic action. A well known example of such a composite action is the Compare-and-Swap instruction.

For simplicity, just remember that atomicity is not a property of memory, but a property its actions, to which a volatile memory access may or may not uphold.

A lot of people tend to treat concurrency and multi-threading as the same thing. In many ways, they are correct given the similarities between the two. But when trying to reason program behaviour, teasing out the distinction is somewhat important. Concurrency, , is by definition multi-threading; but multi-threading, , is not necessarily concurrent.

To make it clear why concurrency is not strictly multi-threading, we start off by defining what concurrency is. Let be the set of all program actions and that and . Also, denotes that the action precedes action .

The explanation is actually straightforward when expressed in plain English.

The set of rules says (1) if all the actions in set precedes all the actions in set , and all the actions in set precedes all the actions in set , then the set of actions are equivalent; (2) for any action that precedes action , and any action that precedes action , then action precedes action . This is known as a transitive relation, but note that while this transitive relation describes an action, broadly this relationship holds within a set of actions as well. (3) All the set of actions in precedes all the set of actions in , i.e. this loosely means that if an action has happened before, then it cannot happen again, which is used to describe how program flows.

To put it simply, concurrency means that for a given quantum of time, there can be one or more program actions happening simultaneously. A set of actions on a given thread T1will be causally ordered by only on the thread itself, however this set of actions may or may not have an ordering relationship with the set of actions on another thread T2.

Understanding what concurrency is, how is multi-threading different?

Consider multi-threading on a single CPU machine. In a uni-core CPU, all actions are totally ordered, ie no two actions can happen at a single quantum of time, but this is not the case for a multi-core (concurrent) CPU. Obviously, we can easily make a multi-core CPU function like a single core CPU, but certainly not the other way around, which is why .

Even though most times type programs are what we are interested in, we do usually assign an arbitary program order to these programs to make it functionally behave like a single-core program, which is much easier to reason with. You see these to totally ordered conversions from time to time, such as in my discussion of the Implementation of Volatile in the JVM.

To put it simply, threading is a way of describing program actions for concurrent programs, and sequential programs can also be reasoned in the context of threads, but not the other way round. For a simple illustration, consider the following program:

The code is multi-threaded; assume that there are 2 threads executing at the moment, and with thread T1, T2 returning IDs of 1, 2 correspondingly.

There is an interesting property with this piece of code - if the threads are concurrent, then eventually the thread_id value will change, and the code can never halt (and the threads will be fairly-scheduled as an interesting side effect). But if the threads are sequential, then one thread will enter the lock ( ) code, and halts the processor in a wait loop.

For those who may be interested, the code in this example is a part of an implementation of Peterson's algorithm, and a simple and poignant example to show how concurrent programs can be different from multi-threading ones.