I’ve created another benchmark. This time, I’ve benchmarked the different ways of synchronizing a little code using mutual exclusion on this code.

The code to protect will be very simple. It’s a simple counter :

//Initint counter =0;

//Critical sectioncounter++;

The critical section, if not protected with the synchronization system, will not function properly due to possible interleavings (read the article on synchronization if you don’t know what interleaving is).

I’ve used three different synchronizers to synchronize this increment :

synchronized block

Semaphores (fair and unfair)

Explicit locks (fair and unfair)

I’ve also used a third way to solve the problem with AtomicInteger. This is not the same as the other ways because it does not provide mutual exclusion. This is a good way to synchronize simple values, like integers or boolean, and also references. The atomicity of the operations of the AtomicInteger is made using the compare-and-swap operation of the operating system. So there are no waiting operations. This means that we have less context switches and result in more performing code normally.

I used Runnable to facilitate the testing and timing of the different mechanisms.

The test is made in two phases :

Test with only one thread with a sophisticated benchmark framework. This act also as warmup for the different code.

Test with several threads (several test with increasing number of threads). The test is made using a little code I wrote for the occasion. Each method is executed 2²³ times (8388608 times exactly).

The source code is available at the end of the post.

The test has been launched on a Ubuntu 10.04 with a Java 6 virtual machine. The computer has a 64 bit Core 2 Duo 3.16 Ghz processor and 6Go of DDR2.

So let’s see the results. First with one thread :

Synchronization Benchmark - One Thread

The first thing we see is that the AtomicInteger is the fastest version. This is because AtomicInteger does not use a waiting operation, so this results in less context switches and more performances. But this is not exactly the case of the benchmark, so let’s concentrate on the 5 other methods. We see that the synchronized method is the fastest and that fair methods are a little slower than unfair, but not a lot.

Now, we’ll test the scalability of all these methods using several threads.

Synchronization - 2 threads

In this method we can see that the fair methods are awfully slow compared to the the unfair versions. Indeed adding fairness to a synchronizer is really heavy. When fair, the threads acquire the locks in the order they ask for. With nonfair locks, barging is allowed. So when a thread tries to acquire the lock and its available, it can acquire it even if there is threads waiing for the lock. It’s heavier to provide fairness because there is a lot more context switches. The problem was not here with only one thread because it’s always fair.

The results for the other versions are the same as with one thread.

Let’s add two more threads :

Synchronization - 4 threads

The fair versions are more and more slow when we add threads. The scalability of these methods is really bad. Let’s see the graph without the fair versions :

Synchronization - 4 threads

This time we can see some differences. The synchronized method is the slower this time and semaphore has a little advantage. Let’s see with 8 threads :

Synchronization - 8 threads

Here the synchronized method is much slower than the other methods. It appears that the algorithm of the synchronized block is less scalable than the explicit locks and semaphore versions. Let’s watch what happens with other number of threads :

Synchronization - 32 threads

Synchronization - 128 threads

I’ve also made the test with other number of threads (16, 64 and 256), but the results are the same as the other.

We can make several conclusions based on the results :

Fair versions are slow. If you don’t absolutely need fairness, don’t use fair locks or semaphores

Semaphores and explicit locks have the same performance. This is because the 2 classes (Semaphore and ReentrantLock) are based on the same class AbstractQueueSynchronizer that is used by almost all synchronization mechanisms of Java

Explicit locks and semaphores are more scalable than synchronized blocks. But that depends on the virtual machine, I’ve seen other results indicating that the difference is a lot smaller

The AtomicInteger is the most performing method. This class doesn’t provide mutual exclusion, but provide thread safe methods to works on simple values (there is version for Long, Double, Boolean and even Reference)