Dynamic data structures

If we
want to be able to design data structures
which maximal size is not fixed limited, as
example a linked list, we must also be able
to handle dynamic allocation of memory. As
we now must share pointer references to
these memory blocks to other threads, a new
problem arises - when can we safely re-use
the memory of blocks that are no longer
used? If we for example has taken away an
element in our linked list, and thus wants
to return the memory for this to the system,
we must be sure that no other thread has a
reference to this memory and is going to
read or write to it. We must in some sense
thus have some kind of "garbage collection"
facility, i.e., some reclamation system. If
we want our non-blocking functions still to
be purely non-blocking, then all
sub-functions must also be so. This
consequently includes also function calls to
the garbage handler. Moreover, we want to be
able to allocate and return the memory
dynamically, and then must also this be done
non-blocking.

A
lock-free memory handler

In
the code below is shown a very simple
"lock-free" memory handler that can manage
to allocate memory blocks of fixed size and
also to return back the memory when the
memory blocks are no longer reference to in
any way. It must be pointed out that this
example definitely is not the most efficient
way to do it, but it is an illustrative and
simple example the clearly alights the
fundamental problems.

The
actual memory blocks for allocation is kept
in a linked list, called "freeList", and
reference counting is used for keeping watch
over references. For this purpose, each
memory block has a variable that indicates
the number of references that exist to this
block. This means concrete that every time
we are following a shared pointer (which is
normally done with the *-operator) we must
also increment the reference counter to the
referred memory block. The problem is that
all steps for this can not be done at once
with help from the basic atomic operations.
When we are increasing the counter (with FAA
for example) it might be the case that the
pointer no longer refers to that memory
block, and that memory block might also have
been re-used and allocated by another
thread. The solution in this example is to
always keep the variable for reference
counting at the same memory address, even
when other parts of the memory block has
been used for other purposes than was done
by this thread.

Similar problems appears at the allocation
of memory blocks. What guarantees that
"p->next" really has the value we read
before, when the CAS operation finally
succeeded? If the memory block "p" has been
allocated and thereafter returned after that
we read "p->next" and before "CAS", it might
actually be that "p->next" has changed. By
increasing the reference counter for the
memory block "p" on before hand, we can make
sure that any reclamation can not have
occurred.

The
reason why we are increasing the reference
counter with 2 every time, is that we are
using the least significant bit (with value
1) in the counter to something else. The
problem is that when we are decrementing the
counter, because a reference is no longer
used, we must be able to decide when the
memory should be returned (probably when the
counter is 0) and more importantly to decide
which thread that should do it. It may
actually be the case that several threads
have read the counter and seen it to be 0,
but only the one that manages to atomically
change it to 1 gets the task to return the
memory block.

A
lock-free dynamic queue

We
are now using our non-blocking memory
handler in order to design a non-blocking
dynamic data structure. In the code below is
shown a simple "lock-free" queue data
structure.

Observe how the calls to the memory handler
are used in the code; "AllocNode" for
allocating new memory blocks, "DeRefLink"
for following (de-referencing) pointers, and
"ReleaseNode" for releasing references that
are no longer used. The code is based on a
linked list, where the first link is
referenced by the "head" pointer and the
last link always is a "dummy" element and
contains no data. In order to remove an
element, "head" is changed by the use of CAS
to point to the next element and to add a
new item a new element is added after the
last element with the use of CAS, as
illustrated in the figure below.

As
the "tail" pointer can not be updated at the
same time as the new element is added, it
will be behind in time, which means that
every thread must step along the linked list
until it reaches the very last element. This
also means that we must be sure that it is
safe to follow the "->next" pointers in
every element, which is done by making sure
that the reference count in the referred
element includes this pointer. Moreover, we
must consequently also decrease the
reference counter when it is time to return
the memory block containing this pointer,
which is done in the function "ReleaseNode"
by recursively calling "ReleaseNode" for all
pointers contained in the memory block.

More
advanced

Theoretically, one has been able to prove
that it is possible to design parallel and
"wait-free" operation of any size and
complexity by the use of CAS. In practice it
is although quite harder. Research in the
area has been done internationally since
over 30 years ago. As an example, there is a
research group at Chalmers University of
Technology which have worked with finding
practical and specific applications on the
technique with results coming relatively far
towards its goal. There are also general
methods that easily can be applied on almost
whatever, but has on the other hand high
overhead and low performance relatively to
specific constructions. Something that also
is in advantage for specific solutions is
that there are now a large number of
constructions, primarily within "lock-free"
that is known and published. Parallel
Scalable Solutions provide commercial
solutions for developers, that span the
greater part of what is known within the
academic world on non-blocking techniques.

Also large companies like Intel, Microsoft,
Sun and so on have initiated their own
projects, even though yet in smaller scale.
Intel merchandize their "Threading Building
Blocks", Microsoft provide their
"Interlocked" class in ".NET" and Win32 API,
and Sun has implemented several non-blocking
classes in Java 1.6. In the Linux community
promising work has been done on combining
simple non-blocking and lock-based
mechanisms, especially in the kernel. This
is mainly in focus on performance, as the
whole construction generally achieves the
same properties as a lock-based solution.
The future will tell, although it is
completely clear that the current
non-blocking techniques can match or even
significantly surpass conventional
lock-based techniques, something that will
definitely be more and more interesting as
the number of logical processors
continuously increase in the future.