If there’s one thing that characterizes a weakly-ordered CPU, it’s that one CPU core can read values from shared memory in a different order than another core wrote them. That’s what I’d like to demonstrate in this post using pure C++11.
For normal applications, the x86/64 processor families from Intel and AMD do not have this characteristic. So we can forget about demonstrating this phenomenon on pretty much every modern desktop or notebook computer in the world. What we really need is a weakly-ordered multicore device. Fortunately, I happen to have one right here in my pocket: The iPhone 4S fits the bill. It runs on a dual-core ARM-based processor, and the ARM architecture is, in fact, weakly-ordered.

As commenter Ross Smith posted, "a rash of bug reports in multithreaded libraries and applications (occurred) around April 2011--Just after the iPad2 was released. That was the first mass market hardware with a multicore ARM CPU, and it gave a lot of supposedly threadsafe code a workout it had never had before."

The blog comes complete with some psudo-code, C++11 snippets, and the resulting assembly generated by the compiler.