Out-of-Thin-Air Execution is Vacuous

Introduction

This paper is an update version of N4216, revised based on discussions
at the 2014 UIUC meeting (with much good feedback especially from
Hans Boehm and Victor Luchangco) and on email reflector discussion.

Out-of-thin-air (OOTA) values have proven to be a thorny issue for memory
models, including the Java memory model (JMM) and the C11 and C++11
memory models.
The current C and C++ draft standards simply advise implementers to
avoid OOTA values, without precisely defining what OOTA values might be.
A number of publications have looked at this, including that of
Vafeiadis et al.,
Batty et al.,
Boehm and Demsky,
Sevcik,
Jeffrey,
and
the JMM Causality Test Cases.
These publications establish that OOTA is harmful, and look at a number of
interesting consequences.
Unfortunately, these publications focus only on a small (relatively)
sensible-seeming subset of possible OOTA scenarios,
This paper will explore some of the less sane scenarios, which will have
the side-effect of demonstrating that out-of-thin-air execution is,
in the C and C++ worlds, vacuous.

To that end, this paper will look at an
interesting open problem, which is the fact that harmful OOTA programs can be
very closely related to benign operation-reordering programs.
This paper will discuss a general method, called perturbation analysis
that may be used to distinguish harmful from benign.

Examples: Harmful OOTA vs. Benign Reordering

The canonical harmful-OOTA example is as follow, where x
and y are both initially zero, and where all accesses
to shared variables are memory_order_relaxed (though
loads may instead be memory_order_consume loads):

Thread 1 Thread 2
-------- --------
r1 = x; r2 = y;
y = r1; x = r2;

The current C and C++ standards do not rule out the outcome of
r1 and r2 both equalling 42—or any
other value that can be represented by x and y.
This outcome would of course be quite surprising, and would have
a number of
fatal consequences.

In contrast, the following closely related program is an example of
benign reordering:

Thread 1 Thread 2
-------- --------
r1 = x; r2 = y;
y = 42; x = r2;

Here, the outcome of r1 and r2 is perfectly
legitimate, and in fact occurs on actual implementations.

However, the presence of the constant 42 in and of itself cannot distinguish
between the benign and harmful cases, for example, this program is an
example of harmful OOTA:

This example can be extended to produce a variety of OOTA values by
expanding Thread 1's “if” statement to provide
additional values.
A very large number and variety of examples can be generated,
a few of which appear in the
the JMM Causality Test Cases.

Properties of Harmful OOTA

In the canonical harmful OOTA example, the value of 42 comes from nowhere,
and circulates between x and y.
This situation suggests the following perverse modification,
which assumes that x and y are unsigned and thus
not subject to undefined behavior upon overflow:

This cannot result in r1 and r2 both having the
value 42.
To see this, note that the only way that r1 can have a
non-zero value is if it loads the value stored by Thread 2.
Similarly, the only way that r2 can have a non-zero value
is if it loads from Thread 1's store.
So suppose that r2 has the value 42.
This means that Thread 2 stores 43, which means that the value of
r1 will also be 43.
But this means that Thread 1 will store 43 to y,
which means that r2 also cannot be 42, contradicting the
initial assumption.
This example demonstrates that OOTA execution is similar
to the classic spreadsheet “solve” functionality: OOTA
conceptually requires iterating until a fixed point is reached.
This functionality has its place in spreadsheets, but has no place
in the confines of the C and C++ memory models, most especially for
non-converging test cases such as above.
Hence, this example demonstrates that OOTA is not just confusing
and harmful (for example, by inflicting undefined behavior on
unsuspecting developers and code), but is also vacuous in the
context of the C and C++ memory models.

Please note that this problem does not occur if either or both of
the loads get the initial value of zero.
The three convergent cases are as follows:

If Thread 2 reads from Thread 1's store, it might see the store
of the constant 42.
In that case, it will store 43 to x.
But if Thread 1 also reads Thread 2's store, it will load
the value 43, and thus won't execute the store of 42,
which means that Thread 2's load gives 43, not 42, contradicting the
initial assumption.

If Thread 1 stores the value 42 to y, Thread 2
will increment it and thus store 43 to x.
But Thread 1's call to f() will strip off the bottom
bit, restoring both the value 42 and consistent execution.
In this case, a perturbation function p that is an
increment fails to force an inconsistency (although it does succeed
in changing the overal behavior).
The choice of the perturbation function p() depends on
the algorithm, and is in the general case undecidable.

However, all is not lost.
First, it can easily be seen that a function p() that
increments by two rather than one suffices to produce the needed
inconsistency.
This is still a total function and results in the following, where
the functions f(),
g(), and p() have all been inlined for ease
of exposition:

Here, if Thread 2's load returns 42, it will store the value 44.
Thread 1's load will thus return 44, which is unaffected
by the bitwise AND, so that Thread 1 stores 44.
This contradicts Thread 2's initial load of 42, thus providing
the needed inconsistency.

Although the choice of p() is in theory undecidable,
the examples in this paper can be solved for a suitable p()
using (at most) simple algebra.
We further conjecture that a randomly chosen function would have a high
probability of forcing an inconsistency.
In fact, it is possible for the identity function to
result in an inconsistency, for example, in the following case:

Thread 1 Thread 2
-------- --------
r1 = x; r2 = y;
y = r1; x = ~r2;

Because the example is itself inconsistent, the choice of the identity
function for p() suffices to preserve this inconsistency.

We further conjecture that the choice of p() is
not only decidable but trivial in the case where all variables
are boolean.
In this case, p() can be simple boolean NOT,
as in the C and C++ prefix ! operator.
In fact, the only reasonable choices for p() are
NOT on the one hand and the identity function on the other.
It might be necessary to apply p() to the Thread 1's
load instead of that of Thread 2.
Of course, just as with integers, it is necessary to check the original
example for inconsistencies before applying a perturbation function.

We will see that the choice of perturbation function p()
is constrained as follows:

p() must be total over the set of possible argument
values.

p() must not violate constraints deduced from
global analysis.

JMM Causality Test Cases

This section applies perturbation to each of the JMM causality test
cases, comparing the results to the decisions.

Causality Test Case 1

The decision is based on the assumption that the compiler determines that
the variables are all non-negative.
We can define p() to be the increment function, and see
that although this choice of perturbation function does change the
behavior, it does not introduce an inconsistency.

On the other hand, if we violate the non-negativity assumption by choosing
p()
to be the function that decrements by two, we have
r2 == 1 && x == -2 && r1 == -2 && y == 0,
which is an inconsistent execution.

This example therefore illustrates another constraint on the perturbation
function, namely that it not violate constraints deduced from global
analysis.

Causality Test Case 2

Assume an arbitrary perturbation function p(), and that
Thread 1's loads happen after Thread 2's store.
Then r1 will always be equal to r2, so
that Thread 1 will always store to y.
Therefore, we have a consistent execution regardless of the perturbation
function.

Causality Test Case 3

Assume an arbitrary perturbation function p(), and that
Thread 1's loads happen after both Thread 2's and
Thread 3's stores.
Then r1 will always be equal to r2, so
that Thread 1 will always store to y.
Therefore, we again have a consistent execution regardless of the perturbation
function.

Causality Test Case 4

This test case was analyzed earlier, and that analysis agrees with the
decision of “forbidden.”
Interestingly enough, a compiler examining this test case could deduce
that only the value 0 is assigned to x and y
(at initialization time).
The JMM applied this sort of compiler-based variable-value deduction to
other test cases, so it is curious that they chose not to apply it to
this case.
(Or, alternatively, given that they did not apply it to this case, it is
curious that they felt comfortable applying it to other cases.)
Of course, in general, the range of values of variables is also
undecidable.

Causality Test Case 5

Because r3 is zero, we know that Thread 4 stored zero
to x.
Therefore, the only way for r1 and r2 to
equal 1 is for an OOTA cycle involving only Threads 1 and 2.
However, this part of the test case is the same as test case 4,
and perturbation analysis gives the same outcome of “forbidden.”

Causality Test Case 6

B is always either zero or one, so Thread 2
will load either zero or one into r2.
This means that one or the other of the two if statements
will always be taken, so Thread 2 will always store the value 1 to
A.
This means that a sufficiently aggressive compiler could eliminate
Thread 2's if statements and simply unconditionally
assign to A.
Because all memory references are relaxed, the order of Thread 2's
load and store can be reversed, after which the result is allowed even in an
SC execution.

Perturbation can chance the values of r1 and r2,
but cannot introduce inconsistencies.
If the compiler cannot determine that B is always either zero
or one, perturbation still cannot introduce inconsistencies.
Either way, perturbation analysis agrees with the decision of
“allowed.”

Causality Test Case 7

Simple reordering can produce the behavior, and adding perturbation can
change the behavior, but cannot result in inconsistencies.
For example, applying an arbitrary perturbation function p()
to the value stored to y results in the following:

In this case, we can see r1 == r2 == 1 && r3 == p(1).
So because perturbation does not introduce inconsistencies (instead merely
changing the behavior), perturbation analysis agrees with the decision
of “allowed.”

Note that applying the perturbation function to Thread 1 has
no effect: The r1 variable is dead code.
Applying the perturbation to Thread 2 causes the value 2 to be
stored to x, which again has no effect in Thread 1
other than changing the value of r1.
Therefore, perturbations do not result in inconsistency, which agrees
with the decision of “allowed.”

This analysis applies given the range determination for x
and y even without the optimization.
In this case, the only reasonable perturbation function is the
! operator, resulting in the following:

In this case, Thread 1's load from x returns either
zero or one, but it will always store the value of one to y.
This means that Thread 2's load from y will always
return the value 1, so that there is no inconsistency.
This again means that this test case is an example of benign reordering
rather than harmful OOTA, again agreeing with the decision of
“allowed.”

Causality Test Case 9

If the compiler can determine that Thread 3 executes only after
both Threads 1 and 2, then analysis proceeds as with
test case 8 above.
On the other hand, if Thread 3 can execute before Threads 1
and 2, then the compiler cannot limit the values of x
and y to zero and one, and so the perturbation might proceed
as follows:

If each of the Thread 1's and Thread 2's loads returns the
value stored by the other thread, inconsistency results.
For example, if we assume Thread 1 stores the value 1, then Thread 2
will store the value 2.
But that would mean that Thread 1 would calculate and store the
value 4, which is inconsistent with the assumption that Thread 2
loaded the value 1.
Therefore, if the compiler is unable to determine that the values of
x and y are limited to zero and one,
then a load-store cycle is illegal.

This situation might seem a bit disturbing, but it in fact will help lead to
key insight, namely that optimizations that replace computations with
the equivalent constants are legal and cannot result in OOTA values.

Given that r3 is equal to zero, we know that Thread 4's
load could not have read from Thread 2's store (possibly due to
Thread 2's store not having executed in the first place).
We also know that Thread 4 did not store to x.
This test case therefore can be analyzed by looking only at Threads 1
and 2.
Perturbation then proceeds as follows:

Suppose that Thread 1's load returns the value that Thread 2
stored.
Then Thread 1's if statement will execute the store
in its then clause.
If Thread 2's load in turn returns the value that Thread 1
stored, r2 will be zero, which will mean that Thread 2's
if statement will not execute the store in its
then clause.
But that means that Thread 1's load cannot possibly return the
value that Thread 2 stored because nothing was stored.
This inconsistency means that this test case is an example of harmful
OOTA, which agrees with the JMM decision of “forbidden”.

Causality Test Case 11

We again assume that each load returns the value of the corresponding store
from the other thread.
This results in an update order of x, y,
z, w.
Because this is acyclic, perturbation cannot introduce an inconsistency,
so this is an example of simple reordering, and not OOTA at all.
Thus, perturbation analysis agrees with the JMM decision of
“allowed.”

Causality Test Case 12

This test case has initial values of zero for x and y,
1 for a[0], and 2 for a[1].

Given this perturbation, if Thread 2 loads the value 1 from
y, then it will store the value 2 to x.
Thread 1 will then load 2, and run off the end of array a,
resulting in undefined behavior (or, if the array has three elements,
uninitialized values).
This is clearly inconsistent, so this is an example of harmful OOTA,
which agrees with the JMM decision of “disallowed.”

As before, suppose that each threads' load returns the value from the
other thread's corresponding store.
Then r2 will be one, so that r2 + 1 will not
be equal to one, in turn meaning that Thread 2's store will not be
executed.
In this case, r1 must be zero, so that Thread 1's
store also is not executed.
This means that r2 cannot possibly have the value one, resulting
in an inconsistency.
This agrees with the JMM decision of “disallowed.”

If Thread 2 leaves its loop due to Thread 1's store to y,
the resulting synchronized-with relationship will force the load from
a to happen before the store, so that r1 == 0.
We therefore consider executions where Thread 2 leaves its loop due
to Thread 1's store to b.

Suppose that Thread 1's load from a returns the value
stored by Thread 2.
Then Thread 1 will store to y, which, as noted above,
ensures that either Thread 2 never exits its loop or that there is
a synchronized-with relationship between the store to and the load from
y.
Either outcome makes it impossible for Thread 1 to load the value
from Thread 2's store to a, resulting in an inconsistency.
This agrees with the JMM's decision of “disallowed.”

Suppose that Thread 1's load from x returns the value
stored by Thread 3 and that Thread 1's load from a
returns the value stored by Thread 2.
But this means that Thread 1 will store to y,
which forces Thread 2's store to a to happen after
Thread 1's load from a, thus forcing an inconsistency.
Hence perturbation analysis agrees with the JMM's decision of
“disallowed.”

Causality Test Case 16

An arbitrary perturbation function applied to either load from
x has no effect on subsequent execution (for
some definition of “subsequent”).
Therefore, perturbation analysis cannot induce an inconsistency,
which agrees with the JMM decision of “allowed.”

Causality Test Case 17

At the point where Thread 1 loads x into r1,
it has either just loaded the value 42 from x or just stored
the value 42 to x.
Therefore, the compiler could simply set r1 to the constant 42.
Once it has done that, because relaxed accesses do not provide any ordering
guarantees, the assignment to r1 (as well as the subsequent
store to y may be reordered.
Note that this transformation might be a bit controversial, because as soon
as the assignment of 42 to r1 is moved to precede the store to
x, the rationale for replacing the load from x
with 42 disappears.
For the purpose of this analysis, we will assume that relaxed loads and
stores permit even this somewhat extreme reordering.

Given that transformation, no perturbation can change the value
that Thread 1 stores to y, which eliminates any
possibility of inconsistency.
Perturbation analysis thus agrees with the JMM decision of
“allowed.”

Causality Test Case 18

Given a compiler that could figure out that the only possible values that
could be loaded from x are 0 and 42, the perturbation analysis
is restricted to perturbing within these two values, which gives the same
result as test case 17.

Causality Test Case 20

If the compiler is allowed to optimize across the join,
this is the same as test case 18.

Causality Test Case Discussion

In all cases, perturbation analysis gives the same decision as did the JMM's
deliberations.
We therefore hypothesize that the analysis distinguishes benign reordering
from harmful OOTA.

It is important to note that the perturbation-analysis approach sidesteps
the issue of which compiler optimizations may be used in a given situation:
Optimizations are applied first, and only then is perturbation analysis
undertaken.
However, this sidestepping has the benefit that perturbation analysis
applies equally well to
C, C++, and Java, despite the very different restrictions on optimizations
across these three languages.

It would be nice to have a succinct description of the set of test cases
in which perturbation functions introduced inconsistencies.
Ali Sezgin pointed out this set is described by rf ∪ sdep,
where rf is the reads-from relationship and sdep
is “semantic dependence”, roughly defined as those dependency
relationships in which at least some changes in the value at the head of the
dependency relationship propagate through, resulting in a change at the
tail of that relationship.

Prohibiting executions that have cycles in rf ∪ sdep
can therefore be expected to prohibit OOTA behaviors.

One beneficial consequence of this relationship to semantic dependency is
that rf ∪ nsdep cycles are allowed, where
nsdep ∩ sdep is the empty set and where
nsdep ∪ sdep = dep.
This means that the compiler is free to replace expressions that are
known to always result in a single value with the corresponding constant,
without danger of introducing OOTA behavior.
We hypothesize that non-speculative code-reordering optimizations are
similarly unable to introduce OOTA behavior.

Defining “semantic dependency” sufficiently for formal
modeling remains an open issue.
In the general case, this the question of whether or not a given
dependency is a semantic dependency is of course undecidable.
However, this question can be decided straightforwardly
in many common cases.
One approach would be to flag dependencies that the tool was unable to
classify.
Another approach would be to consider cases that a given compiler might
optimize, and to classify other cases as semantic dependencies.

Asides

The following sections present asides on undecidabilty,
inferred ordering, and code generated by old compilers.

Aside on Undecidability

The fact that the choice of the perturbation function is undecidable
is no greater obstacle for OOTA than it is for anything else.
After all, almost all interesting questions about Turing-complete
languages are undecidable.
(As Doug Lea pointed out, others are “merely” NP.)
The following simple example is a case in point:

Is the outcome r1 == 1 && r2 ==0 permitted?
In general, this is undecidable.

So what do we do about this?

The same things that we have always done.
The ppcmem and herd tools permit only a small
finite number of variables, thus avoiding undecidability.
The cbmc model checker limits the number of passes through
each loop, thus considering only finite executions, again avoiding
undecidability.
These two strategies should also work well for perturbation analysis.

And selection of a perturbation function is usually straightforward:

Select the presumed cycle. (Yes, in a large program, there
might be a lot of them. Just like there might be a lot
of synchronized-with relationships.)

Pick a load on the presumed cycle.

Select a return value for the load (usually given by the
assertion).

Check whether the selected return value is consistent,
in other words, whether this value results in that same
value being stored to that variable.
In theory, this step can be undecidable because an overly
clever programmer might do something like make the value
stored depend on some undecidable proposition such as the
halting problem.
In practice, making one's program depend on an undecidable
proposition seems like a clear case of a deeply flawed design.

If the value is consistent, solve for a perturbation function
that makes it inconsistent. For most litmus tests, this is
at worst simple algebra. Of course, it might be undecidable,
in which case it is time to spend some quality time with the
litmus test's author. ;-)

So the undecidability should not normally be a problem in practice.

Aside on Inferred Ordering

Suppose that a highly optimizing compiler and a less-aggressive
analysis tool are applied to the following litmus test (put forward
by Hans Boehm):

Suppose the compiler determined that function f() did
not represent a semantic dependency, but that the analysis tool was
unable to make this determination.
Might the analysis tool therefore incorrectly report to the developer
that Thread 2's load from y is ordered before
its store to x?

When considering this question, keep in mind that all accesses
are relaxed.
This means that there are no ordering properties, unless they are
supplied by other non-relaxed accesses.
Therefore the answer to the question is that the analysis tool
should not report that Thread 2's
accesses are ordered in any case.
It should instead confine itself to disregarding any candidate executions
involving OOTA results.

What can happen is that the compiler might be able to determine
that f(y) always returns 42, thus allowing it to transform
the code as follows:

The compiler might then generate code that resulted in
x = y = 42.
If the analysis tool had less sophisticated analysis than did the compiler,
the tool might well exclude this result.
But this would constitute a bug in the tool rather than a problem with OOTA:
The compiler correctly determined that the dependency of r3
on r2 was not a semantic dependency, while the tool failed
to make this distinction.
However, a high-quality tool would report that it was unable to prove
whether or not f() represented a semantic dependency.

Aside on Code From Old Compilers

Suppose that the code for function f() in the prior example
was generated by an old compiler, perhaps even one that is unaware of
C11 and C++11 atomics.
Mightn't such a compiler carry out optimizations that could result in
OOTA executions?
(This possibility was raised in a small-group discussion by Hans Boehm
at the 2014 UIUC meeting.)

The answer is "no" for all known compilers used in production.
The only possible exception would be research compilers used to
investigate value speculation and similar extreme optimizations.
However, these research compilers could potentially generate OOTA executions
even in sequential code, so it makes sense to exclude them from
consideration.

Therefore, this possibility does not permit data-race-free legacy
libraries to inflict OOTA executions on multithreaded programs.

Summary

This document has shown that all of the harmful OOTA examples in the
the JMM Causality Test Cases
are special cases that have a fixed point, and that slight perturbations
result in inconsistent results.
This supports the hypothesis that any harmful-OOTA test case can be
perturbed into an inconsistent state and that benign-reordering test
cases cannot be.

This perturbation analysis appears to be equivalent to requiring that
rf ∪ sdep be acyclic.
This is an extremely important result: It means that any compiler
optimization that substitutes a constant value for a read known to return
that value cannot induce OOTA behavior.
This constraint should also ensure that non-speculative
code-movemenet optimizations should be similarly unable to induce
OOTA behavior.

Effective and efficient modeling of semantic dependencies
(sdep) remains an important open problem.