Friday, December 23, 2011

Concurrent programming is a very difficult problem to tackle. The fundamental issue is that manual locking is not composable, which is to say that if you have two concurrent programs P0 and P1 free of deadlocks, livelocks and other concurrency hazards, and you try to compose P0 and P1 to create a program P2, P2 may not be free of concurrency hazards. For instance, if P0 and P1 take two locks in different orders, then P2 will deadlock. Needless to say, this is a serious problem because composition is the cornerstone of all programming.

I've been toying with some ideas for software transactional memory (STM) in C# ever since I started playing with FRP and reactive programming in general. The problem in all of these domains is fundamentally about how to handle concurrent updates to shared state, and how to reconcile multiple, possibly conflicting updates to said state.

Rx.NET handles concurrency essentially by removing the identity inherent to shared state. An IObservable<T> is actually a collection of all values pushed to that observable in some undefined order. If you were to create an IObservable that retains only the "last" pushed value, and thus now retains an identity, you then have the same problems as above, namely that this update must always be consistent with other updates at any given instant in time. For instance:

var plusOne = intObservable.Select(i => i+1);

At every instant in the program's execution, plusOne should always observably equal intObservable + 1, and the ability to observe a violation of this constraint is known in reactive literature as a 'glitch'.

Similarly, in database programming where transactions rule, this is known as a 'dirty read'. Essentially, an update to intObservable is executing in a transaction, but other transactions are able to view those changes before that transaction has committed.

Generally speaking, glitches and dirty reads are undesirable, because they require the developer to manually synchronize state, which defeats the whole purpose of going with FRP or transactions to begin with. From what I've seen so far, Rx.NET gets around this by not providing abstractions that expose identity in this way. The programs you write must work with collections of values, and the program must specify the ordering via Observable.OrderBy.

When I added the Property<T> IObservable to Sasa, I added a limited form of transactions to prevent glitches, because a property has identity. This implementation uses a global 'clock', which is really just a global uint64 counter to properly sequence updates and prevent glitches.

Overview

I'm going to focus here on implementing STM directly, but to keep it simple, I've gone with the simplest STM that is expressible using .NET primitives. In fact, the resulting STM is probably not good if you're after scaling, it does a good job of ensuring concurrency safety for arbitrary composition.

The STM I committed to Sasa is a very simple, perhaps even simplistic, STM employing encounter-time locking with deadlock detection on transactional variables. Any read or write acquires the lock on a transactional variable. Whenever two transactions would block to wait for each other, the transaction that is not already blocked is aborted and retried.

This design has advantages and disadvantages. The disadvantages are the limited concurrency even when reads and writes would not conflict. Two transactions that only read a transactional variable Y, would still block each other despite the fact that concurrent reads can't cause problems. Furthermore, the use of encounter-time locking means that locks can be held for a long time. Finally, the naive deadlock detection combined with encounter time locking means that some programs will have higher abort rates than they would in other STMs.

There are significant advantages to this approach though. For one, a transaction doesn't require elaborate read/write/undo logs. In fact, this STM requires only a single allocation for the transaction object itself at transaction start. By contrast, most other STM designs require at least one allocation for every object that is read or written. These allocation costs are generally amortized, but they still add up.

The STM is also conceptually simple at 450 lines of code, including elaborate comments (127 lines counting only semicolons). This STM consists of only 3 classes, and 1 exception, and uses only System.Monitor for locking. This means that the STM isn't really fair, but it's rather simple to replace standard locks with a fair locking once the core STM algorithm is understood.

There is also preliminary support for integration with System.Transactions.

This class is closely modeled on the design of TransactionScope from System.Transactions. Programs will generally concern themselves mostly with transactional variables, which in Sasa.TM is called Transacted<T>:

Any reads and writes to Transacted<T> occur within the lifetime of a MemoryTransaction, and the set of all such reads and writes are committed atomically. A simple program demonstrating the use of these abstractions:

MemoryTransaction.Run will handle all the commits, rollbacks and retries for you. You can do this manually as well if you catch RetryException, and call Complete and Dispose methods on the transaction manually, but for most purposes the Run method suffices. You can nest calls to Run as many times as you like, but only one top-level transaction will ever be created.

No matter how many concurrent threads are executing the above code, it will always be updated atomically, and you can compose the above program with any other transactional program, and the result will also be free of concurrency hazards. The one caveat is that you should not cause non-transactional side-effects from within a transaction.

Each Transacted<T> is a member of a linked list rooted in the "participants" field of the MemoryTransaction. The list consists of Transacted<T> which have been read or written during the current transaction.

Each Transacted<T> points to the current transaction that owns its lock.

Each MemoryTransaction that attempts to acquire a lock on a Transacted<T>, stores the Transacted<T> in a local field called "waitingFor".

Transacted<T> stores the original value before any changes are made, so we can rollback if the transaction aborts.

From the above graph, we can see that there are two running transactions, Tx0 and Tx1, and that Tx0 has read or written Transacted0 and Transacted1, and it has tried to read/write Transacted2. However, Tx1 currently owns the lock on Transacted2, so Tx0 is effectively blocked waiting for Tx1 to complete.

This dependency graph is acyclic so there is no deadlock. If Tx1 were to then try to acquire the lock on Transacted0 or Transacted1, we would create a cycle in the waits-for graph, and we would have to abort one of the transactions.

On commit, a transaction's participant list is walked, unlinking elements as it goes, and all the undo fields are cleared and the locks are released. The next transaction blocked on any of the participants acquires the lock it's been waiting for, sets the owner field, and proceeds.

Rollback is much the same, except the Transacted<T>'s value field is first overwritten with the value from the undo field.

Future Work

Fair STM

To those that have read my previous posts, note that the structure of the MemoryTransaction is exactly the structure of MetaThread from a previous post. By simply adding a WaitHandle to MemoryTransaction with a FIFO locking protocol, we have a fair STM.

Lock Stealing

STM research so far has shown that most transactions are short enough that they can execute in a single timeslice, and throughput suffers if a thread is descheduled while it's holding locks. This would only be exacerbated in an encounter-time locking design like I've described here, since locks are held for longer.

Instead of blocking on a variable that is already owned, we can instead steal the lock under certain conditions. For instance, if Tx0 and Tx1 are merely reading from a variable, they can repeatedly steal the lock from each other without concern.

A transaction that writes a variable that has only been locked for reading, can steal that lock too, but if the original owner tries to read the variable again, it must abort.

If Tx0 and Tx1 both try to write the same variable, blocking is unavoidable.

Obviously, all of these performance improvements impact the simplicity of the original design, so I'm leaving them for future work if the need arises.