Sunday, September 11, 2011

Alternatives to Multi Threading for concurrency in Java. STM and Actors

In the Java world we are used to work with concurrency dealing directly with Threads, Synchronization and Shared Data.

Java introduced a great deal of improvement in version 5 with the new concurrency facilities it provided, and had built on it in subsequent versions of the language.

These facilities are great and powerful, but they still can sometimes be hard to use correctly. I will cover concurrency in the traditional (and concurrency utilities) way in another post. This post is just about introducing a couple of alternatives to these.

In this post I’ll introduce the Software Transactional Memory (STM) model for concurrency. In a later post I’ll introduce the Actors model in Java.

STM: Very much like the transactions we are use to in DBs but working with objects in memory, keeping the ACI properties of transactions.

In this model we separate identity from state. This basically means that we have some entities that hold an inmutable state that can change over time. We’ll understand this better with the example that will follow.

The main advantage of this separation is that there is no need to block or synchronize as the state doesn’t change.

I’ll write a piece of code now and explain how it works. For the example I will use Akka and its support for STM with Multiverse.

Let’s assume we want to create a very simplistic kind of Stack (No validations, no anything) that only supports push, peek and pop and it should be thread safe. The simplistic code would be something like the following (No thread safe).

It is very simple to see that there exist a thread safety problem with the state variable ‘top’. The problem is simply that this variable is shared mutable state. A variable that can be accessed, read and changed concurrently by mutliple threads. And of course the elements array itself has the same problem.

The common solution to this problem, would be to synchronize the access to the three methods. This approach is not hard to follow here in this very simple example, but even here it has a couple of drawbacks. One of them is that synchronizing and lock acquiring affects performance. Other can be that for example the peek method is not changing state, and in an application that will use peek more than any other method (more reading and moderate writing) no more than one thread will be able to peek. Also we need to remember to synchronize every method that we put in this class that will either read or write. When things start to get a little complicated (more locks, more granularity, more mutable shared state) understanding synchronization and locking logic is not very easy, and debugging synchronization and concurrency problems is hard to say the least.

The first thing to notice is that we changed the elements array of objects to a Ref object to an object of the new class Elements. We can also see that the Elements objects created are immutable. Noone has direct access to mutate its state. It never changes after created. This is one of the ideas that need to be encouraged. The use of immutable objects.

As we talked before the use of Ref helps to separate the ID of the reference to its state so in this case we have a Ref that will point to a different Elements object in different moments of time, but will not modify the one it has currently directly, it will just replace it. We do the same with the ‘top’ variable, making it a Ref instead of an Integer.

The next thing is the implementation of the different methods. As we can see we are wrapping each method content in an Atomic object, and putting the content inside the atomically method. This will ensure that the operation runs inside in a transaction.

The transaction ensures that all the values in the Ref variables are consistently changed or rolled back in case some other transactions modify them in the meantime, retrying automatically if that is the case. We can also observe the ‘retry’ method call. This method help us retry the transaction in the case a pre-condition is not met to run the transaction in the first place.

Event though we called the push on the stack only 7 times, we can see that the push method (actually just the transaction part) was executed more than 20 times. This is because of all the retrying when multiple threads collide in the transaction execution. We can see that maybe for heavily writing applications STM transactions might not be the best solution because of all the retrying.

We can see that no retry is needed for the peeking. We can see that STM is better suited for multiple reads, low-to-moderate writes.

So with STM we have a model that avoids the use of explicit locking, offers great concurrency performance, helps encourage the use of immutable types and just mutable References, makes explicit the parts that we expect to be executed atomically.
It also dynamically and automatically handles threads and contention, and can help mitigate or entirely remove deadlocks as we don’t deal with locking or lock orders anymore.

STM is an alternative worth considering if we have requirements for concurrency that adjust to this model.