memory_order_consume

std::memory_order_consume is the most legendary of the six memory models. That's for two reasons. At one hand, std::memory_order_consume is extremely hard to get. At the other hand - that may change in the future - no compiler supports it.

How can it happen, that a compiler supports the C++11 standard, but doesn't support the memory model std::memory_order_consume? The answer is, that compiler maps std::memory_order_consume to std::memory_order_acquire. That is fine because both are load or acquire operations. std::memory_order_consume requires weaker synchronisation and ordering constraints. So the release-acquire ordering is potentially slower than the release-consume ordering but - that is the key point - well defined.

To get an understanding of the release-consume ordering, it's a good idea to compare it with the release-acquire ordering. I speak in the post explicitly from the release-acquire ordering and not from the acquire-release semantic to emphasise the strong relationship of std::memory_order_consume and std::memory_order_acquire.

Release-acquire ordering

As starting point I use a program with two threads t1 and t2. t1 plays the role of the producer, t2 the role of the consumer. The atomic variable ptr helps to synchronize the producer and consumer.

That was easy. But now the program has undefined behaviour. That statement is very hypothetical, because my compiler implements std::memory_order_consume by std::memory_order_acquire. So under the hood both program actually do the same.

Release-acquire versus Release-consume ordering

The output of the programs is identical.

Although I repeat myself, I want to sketch in a few words, why the first program acquireRelease.cpp is well defined.

The store operation in line 16 synchronizes-with the load operation in line 21. The reason is, that the store operation uses std::memory_order_release, that the load operation uses std::memory_order_acquire. That was the synchronization. What's about the ordering constraints of the release-acquire ordering? The release-acquire ordering guarantees, that all operations before the store operation (line 16) are available after the load operation (line 21). So the release-acquire operation orders in addition the access on the non-atomic variable data (line 14) and the atomic variable atoData (line 15). That holds, although atoData uses the std::memory_order_relaxed memory model.

The key question is. What happens, if I replace in the program std::memory_order_acquire by std::memory_order_consume?

Data dependencies with std::memory_order_consume

The std::memory_order_consume is about data dependencies on atomics. Data dependencies exist in two ways. At first carries-a-dependency-to in a thread and dependency-ordered_before between two threads. Both dependencies introduce a happens-before relation. That is this kind of relation a well defined program needs. But what means carries-a-dependency-to and dependency-order-before?

carries-a-dependency-to: If the result of an operation A is used as an operand of an operation B, then: A carries-a-dependency-to B.

dependecy-ordered-before: A store operation (with std::memory_order_release, std::memory_order_acq_rel or std::memory_order_seq_cst), is dependency-ordered-before a load operation B (with std::memory_order_consume), if the result of the load operation B is used in a further operation C in the same thread. The operations B and C have to be in the same thread.

Of course I know from personal experience, that both definitions are not easy to digest. So I will use a graphic to visually explain them.

The expression ptr.store(p, std::memory_order_release) is dependency-ordered-before while (!(p2 = ptr.load(std::memory_order_consume))), because in the following line std::cout << "*p2: " << *p2 << std::endl the result of the load operation will be read. Further, holds: while (!(p2 = ptr.load(std::memory_order_consume)) carries-a-dependency-tostd::cout << "*p2: " << *p2 << std::endl, because the output of *p2 uses the result of the ptr.load operation.

But we have no guarantee for the following outputs of data and atoData. That's because both have no carries-a-dependency relation to the ptr.load operation. But it gets even worse. Because data is a non-atomic variable, there is a race condition on data. The reason is, that both threads can access data at the same time and thread t1 wants to modify data. Therefore, the program is undefined.

What's next?

I admit, that was a challenging post. In the next post I deal with the typical misunderstanding of the acquire-release semantic. That happens, if the acquire operation is performed before the release operation.

Go to Leanpub/cpplibrary"What every professional C++ programmer should know about the C++ standard library".Get your e-book. Support my blog.

Hi Ranier, Intel Inspector 2018 beta shows race for above example having memory_order_acquire. Race is shown for 'data' which is written in producer and read inside consumer. I can email you the screen shot if you want. Program was compled with -pthread -g -std=c++14 . Complier version