Blog Archives

Tuesday, 27 July 2010

Dekker's
algorithm is one of the most basic algorithms for mutual
exclusion,
alongside Peterson's
algorithm
and Lamport's
bakery algorithm. It has the nice property that it only requires
load and store operations rather than exchange or test-and-set, but
it still requires some level of ordering between the operations. On
a weakly-ordered architecture such as PowerPC or SPARC, a correct
implementation of Dekker's algorithm thus requires the use of fences
or memory barriers in order to ensure correct operation.

The code

For those of you who just want the code: here it is —
Dekker's algorithm in C++, with explicit fences.

The analysis

If you're like me then you'll be interested in why stuff
works, rather than just taking the code. Here is my analysis of the
required orderings, and how the fences guarantee those orderings.

Suppose thread 0 and thread 1 enter p0
and p1 respectively at the same time. They both set their
respective flags to true, execute the fence and then read
the other flag at the start of the while loop.

If both threads read false then both will enter the
critical section, and the algorithm doesn't work. It is the job of the
fences to ensure that this doesn't happen.

The fences are marked with memory_order_seq_cst, so either the
fence in p0 is before the fence in p1 in the global ordering of
memory_order_seq_cst operations, or vice-versa. Without
loss of generality, we can assume that the fence in p0
comes before the fence in p1, since the code is
symmetric. The store to flag0 is sequenced before the
fence in p0, and the fence in p1 is
sequenced before the read from flag0. Therefore the
read from flag0 must see the value stored
(true), so p1 will enter
the while loop.

On the other side, there is no such guarantee for the read
from flag1 in p0, so p0 may or
may not enter the while loop. If p0 reads
the value of false for flag1, it will not
enter the while loop, and will instead enter the critical
section, but that is OK since p1 has entered
the while loop.

Though flag0 is not set to false
until p0 has finished the critical section, we need to
ensure that p1 does not see this until the values
modified in the critical section are also visible to p1,
and this is the purpose of the release fence prior to the store
to flag0 and the acquire fence after
the while loop. When p1 reads the
value false from
flag0 in order to exit the while loop, it
must be reading the value store by p0 after the release
fence at the end of the critical section. The acquire fence after the
load guarantees that all the values written before the release fence
prior to the store are visible, which is exactly what we need
here.

If p0 reads true for flag1,
it will enter the while loop rather than the critical
section. Both threads are now looping, so we need a way to ensure
that exactly one of them breaks out. This is the purpose of
the turn variable. Initially, turn is 0,
so p1 will enter the if and
set flag1 to false, whilst p1
will not enter the if. Because p1 set
flag1 to false, eventually p0
will read flag1 as false and exit the
outer while loop to enter the critical section. On the
other hand, p1 is now stuck in the
inner while loop because turn is
0. When p0 exits the critical section it
sets turn to 1. This will eventually be seen
by p1, allowing it to exit the inner while
loop. When the store to
flag0 becomes visible p1 can then exit the
outer while loop too.

If turn had been 1 initially (because p0
was the last thread to enter the critical section) then the inverse
logic would apply, and p0 would enter the inner loop,
allowing p1 to enter the critical section first.

Second time around

If p0 is called a second time whilst p1
is still in the inner loop then we have a similar situation to the
start of the function — p1 may exit the inner
loop and store true in flag1
whilst p0 stores true
in flag0. We therefore need the
second memory_order_seq_cst fence after the store to
the flag in the inner loop. This guarantees that
at least one of the threads will see the flag from the other thread
set when it executes the check in the outer loop. Without this fence
then both threads can read false, and both can enter
the critical section.

Alternatives

You could put the ordering constraints on the loads and stores
themselves rather than using fences. Indeed, the default memory
ordering for atomic operations in C++
is memory_order_seq_cst, so the algorithm would "just
work" with plain loads and stores to atomic variables. However, by
using memory_order_relaxed on the loads and stores we
can add fences to ensure we have exactly the ordering constraints
required.

Wednesday, 14 July 2010

The
upcoming C++0x
standard includes reference wrappers in the form of
the std::reference_wrapper<T> class template, and
the helper function templates std::ref()
and std::cref(). As I mentioned in my blog post
on Starting
Threads with Member Functions and Reference Arguments, these
wrappers can be used to pass references to objects across interfaces
that normally require copyable (or at least movable) objects
— in that blog post, std::ref was used for passing
references to objects over to the new thread, rather than copying the
objects. I was recently asked what the difference was
between std::ref and std::cref, and how they
worked, so I thought I'd elaborate.

Deducing the Referenced Type

std::ref is a function template, so automatically
deduces the type of the wrapped reference from the type of the
supplied argument. This type deduction includes
the const-ness of the supplied object:

Since a no-const-reference can always be bound to
a const reference, you can thus
use std::ref in pretty much every case where you would
use std::cref, and your code would work the same. Which
begs the question: why would you ever choose to
use std::cref?

Using std::cref to prevent modification

The primary reason for choosing std::cref is because
you want to guarantee that the source object is not modified through
that reference. This can be important when writing multithreaded
code — if a thread should not be modifying some data then it
can be worth enforcing this by passing a const
reference rather than a mutable reference.

This can be important where there are overloads of a function such
that one takes a const reference, and the other a
non-const reference: if we don't want the object
modified then it is important that the overload taking
a const reference is chosen.

References to temporaries

std::cref has another property missing
from std::ref — it can bind to temporaries, since
temporary objects can bind to const references. I'm
not sure this is a good thing, especially when dealing with multiple
threads, as the referenced temporary is likely to have been
destroyed before the thread has even started. This is therefore
something to watch out for:

Documentation

Finally, std::cref serves a documentation purpose,
even where std::ref would suffice — it declares
in big bold letters that this reference cannot be used to modify the
referenced object, which thus makes it easier to reason about the
code.

Recommendation

I would recommend that you use std::cref in preference
to std::ref whenever you can — the benefits as
documentation of intent, and avoiding accidental modification
through the reference make it a clear winner in my opinion. Of
course, if you do want to modify the referenced
object, then you need to use std::ref, but such usage
now stands out, and makes it clear that this is the intent.

You do still need to be careful to ensure that you don't try and
wrap references to temporaries, particularly when
applying std::cref to the result of a function call,
but such uses should stand out — I expect most uses to be
wrapping a reference to a named variable rather than wrapping a
function result.