Friday, March 11, 2011

A quick update on performance of JGroups 2.12.0.Final

I forgot to add performance data to the release announcement of 2.1.0.Final, so here it is.

Caveat: this is a quick check to see if we have a performance regression, which I run routinely before a release, and my no means a comprehensive performance test !

I ran this both on my home cluster and our internal lab.

org.jgroups.tests.perf.Test

This test is described in detail in [1]. It forms a cluster of 4 nodes, and every node sends 1 million messages of varying size (1K, 5K, 20K). We measure how long it takes for every node to receive the 4 million messages, and compute the message rate and throughput, per second, per node.

This is my home cluster and consists of 4 HP ProLiant DL380G5 quad core servers (ca 3700 bogomips), connected to a GB switch, and running Linux 2.6. The JDK is 1.6 and the heap size is 600M. I ran 1 process on every box. The configuration used was udp.xml (using IP multicasting) shipped with JGroups.

Results

1K message size: 140 MBytes / sec / node

5K message size: 153 MBytes / sec / node

20K message size: 154 MBytes / sec / node

This shows that GB ethernet is saturated. The reason that every node receives more than the limit of GB ethernet (~ 125 MBytes/sec) is that every node loops back its own traffic, and therefore doesn't have to share it with other incoming packets. In theory, the max throughput should therefore be 4/3 * 125 ~= 166 MBytes/sec. We see that the numbers above are not too far away from this.

org.jgroups.tests.UnicastTestRpcDist

This test mimicks the way Infinispan's DIST mode works.

Again, we form a cluster of between 1 and 9 nodes. Every node is on a separate machine. The test then has every node invoke 2 unicast RPCs in randomly selected nodes. With a chance of 80% the RPCs are reads, and with a chance of 20% they're writes. The writes carry a payload of 1K, and the reads return a payload of 1K. Every node makes 20'000 RPCs.

The hardware is a bit more powerful than my home cluster; every machine has 5300 bogomips, and all machines are connected with GB ethernet.

Results

1 node: 50'000 requests / sec /node

2 nodes: 23'000 requests / sec / node

3 nodes: 20'000 requests / sec / node

4 nodes: 20'000 requests / sec / node

5 nodes: 20'000 requests / sec / node

6 nodes: 20'000 requests / sec / node

7 nodes: 20'000 requests / sec / node

8 nodes: 20'000 requests / sec / node

9 nodes: 20'000 requests / sec / node

As can be seen, the number of requests per node is the same after 2-3 nodes. The 1 node scenario is somewhat contrived as there is no network communication involved.

This is actually good news, as it shows that performance grows linearly. As a matter of fact, with increasing cluster size, the chances of more than 2 nodes picking the same target decreases, therefore performance degradation due to (write) access conflicts are likely to decrease.

Caveat: I haven't tested this on a larger cluster yet, but the current performance is already very promising.