Software transactional memory

Software transactional memory (STM) is a scheme for concurrent programming with multiple threads that uses transactions similar to those used in databases. Today I’ll discuss what STM is, how it works, some implementations, and why you should care.

It’s well-known that if two or more threads blindly try to access the same data at the same time, a variety of undesirable race conditions can result, situations where the outcome of the program will depend on the order in which the threads access the data. Usually, only one of these orders is desirable. In multithreading, the predominant means of synchronization are locks, which in their simplest form enforce that only one thread can access a particular resource at one time. There are several problems with this approach:

If the locks are too coarse-grained (that is, the code sections they guard are too large), the locked code sections can become a contended bottleneck, leading to decreased concurrency and, on multiprocessor machines, performance closer to that of a single processor machine.

If the locks are too fine-grained, the code can end up spending more time locking, unlocking, and context-switching than it does doing real work.

Locks cause several unpredictable concurrency issues of their own, such as deadlock, livelock, priority inversion, and so on.

In the event of an unexpected error, a lock can be left permanently locked, although C#’s “lock block” helps to prevent this.

Last but not least, locks disturb the flow of the program and force the programmer to think about what concurrent threads could be doing – which programmers happen to be quite terrible at doing. This error-prone process leads to many of the most difficult to reproduce and difficult to fix bugs that we face in our daily lives.

Some of these problems are addressed to some extent by more advanced types of locks such as read-write locks, semaphores, monitors, write barriers, and events, but when you add complexity you add overhead and decrease performance, so we continue to use normal locks more often than anything else.

On the other hand, if you’ve ever written a SELECT or UPDATE command in SQL, you probably didn’t stop to think: what if someone updates the rows I’m querying or updating concurrently? Shouldn’t I be locking this table before I mess with it? Should I lock just the rows I’m working with or the whole table? In most applications, query concurrency is simply transparent; the database engine takes care of everything for you. Instead of providing locks, it exposes to the programmer the simpler concept of a transaction. At its heart, a transaction is simply a way of performing a group of operations in such a way that they appear to happen atomically, all at a single instant. The intermediate states cannot be read or updated by any other transaction. This enables you to keep tables consistent with their constraints at all times – one operation can violate a constraint, but both before and after each transaction, it will hold.

In typical implementations of transactions, you have the option of either commiting or aborting a transaction at any time. When you commit a transaction, you agree that the transaction went as planned and make its changes permanently in the database. If you abort a transaction on the other hand, this means that something went wrong and you want to roll back, or reverse, any partial changes it’s already made. Good installers use a similar rollback scheme: if some installation step fails and recovery is not possible, it will erase all the cruft it’s already put on your machine so that it’s just like it was before. If it fails to do this, the machine may become unusable or future installation attempts may fail.

Let’s think about how transactions might be useful in concurrent programming. Say you’re writing a threadsafe doubly-linked list class. You want to insert a node into the middle of the list. In a single-threaded application, you would just write something like this:

Unfortunately, the linked list becomes inconsistent between the last two lines (newNode is in the list, but newNode.next.prev is not newNode), and it’s impossible to reorder these statements in such a way that the linked list will remain consistent between every pair of statements. You could put a lock around every method accessing the list, but then two threads can’t access different parts of the list at the same time, which can be a serious performance problem if it’s a large “global” list accessed by many threads at once. If multiple locks became involved, you would also have to be careful to avoid deadlock.

Let’s take a different approach based on transactions. We introduce the concept of a version for each object (in our example, each node would be an object). The version is just a counter that starts at zero. Each time an object is modified, its version is incremented. Each time the object is read, the version is (atomically) read along with it. Now we’re ready to implement the software transactional memory scheme of Robert Ennals:

Optimistic reads: Every time we read an object, we remember its version at that instant. When we commit the transaction, we verify that the version for every object we read is still the same as it was when we read it. If any of them have changed, we abort and roll back the transaction, undoing any writes it did, and then try the operation over again. If none of them have changed, we know that the concurrent writes would not have altered the transaction’s behavior in any case, so we can commit. The assumption here is that the concurrent writes aren’t going to tamper with the objects we’re reading very often, just other objects we don’t care about.

Private revocable writes: Whenever we want to write to a node, we first lock it. The lock prevents others from reading or writing it simultaneously, and the transaction will continue to hold this lock until it commits or aborts. Next, we make a private copy of the node. Any changes we want to make we make to this private copy. By ensuring that each object is accessed through only one reference (which may itself be referenced by multiple references), we can atomically swap out the old copy for the new, updated copy when we commit the transaction. To abort the transaction, we simply release all write locks and throw away all the private copies. Notice that if a transaction locks a node for writing and later aborts, transactions which read that node can still go on to succeed.

Deadlock detection: When we’re automatically locking things all over the place, deadlock is sure to happen sooner or later. To deal with this, we periodically interrupt the program (much as a garbage collector does) and scan for a cycle of objects that are deadlocked, each waiting on a lock held by the next. We then force one of them to abort, releasing its locks and breaking the cycle. With this system in place, deadlocks can never occur. Databases with locking use a similar system to prevent deadlock.

Edge cases: If a transaction which has read inconsistent state tries to commit, it will fail, but it’s also possible that such a transaction could go into an infinite loop or have a fatal exception because it wasn’t written to deal with this situation (nor should it be). Again, the runtime is responsible for detecting and aborting such transactions.

Now, all this copying, swapping, and retrying of aborted operations looks expensive. However, the truth is that so much concurrency is gained from a transaction-based system that they are often much more efficient than lock-based systems even for a small number of processors, as you can see in the Performance section of Ennals’ paper. The reason for this is that simple locking is often too pessimistic – it locks an entire data structure when really different threads could be safely operating on different parts of it concurrently, as long as you have a roll back mechanism ready in case a conflict arises.

So it’s really a win-win – you can write your program almost as though it were single-threaded, and it runs faster! Even on a single processor, the penalty is not so severe as to outweigh the benefits to the programmer (who is, after all, more expensive than both software and hardware). If we return to our linked list example, we can simply take the single-threaded solution, put each method inside a transaction, and we’re done.

This all sounds nice in theory, but few of us are qualified to write and use a production-quality STM library after a quick read of a paper. Luckily, there are already some good implementations floating around that you can try:

Robert Ennal’s Lightweight Transaction Library, a C library written for UNIX and recently re-released under the modified BSD license (which allows commercial use as well as commercial use of derivative works such as ports). Someone should really port this to a .NET library.

Isn’t this how SQL Server hosts the CLR? That is, I believe that all state is stored in the database and rolled back if a deadlock is detected. As a matter of fact, I believe you’re not even allowed to use locks in CLR code running in SQL Server.

I’m currently working on a C library implementing the ideas on "Composable Memory Transactions" paper.

The library is designed to be easy to use (very small API), non intrusive (it doesn’t require garbage collector nor special memory management for you shared variables) and very portable (it depends on Glib only, and not any particular thread implementation).