Java Performance On Various Platforms

(This page used to be titled "Speed of Java", but
then I realized that this is incorrect in English, so I changed it to "Java Performance" to help Google to find it.)

I tried to evaluate what would be the best machine
to buy for a Java application. When I was looking for
some benchmark results on the Internet, I have found
that there are not any which can be used
to directly compare similar machines with different OS and CPU.

The famous Volano benchmark
published results of JVMs running on Intel IA-32 hardware only.
The SPEC JBB200
recent results compare 32-CPU IBM servers against 112-CPU SUN servers and the like,
which is not very helpful either. To run it yourself, you have to pay $400
for the test.

The SciMark benchmark tests mostly floating point operations, so it cannot be
used to estimate performance for grid services, where XML stacks and web containers
are in play. So I have put together another test.

In this test, a SOAP webservice was deployed into
Tomcat 5.0.18/Axis 1.1container,
and a client called that webservice, sending there and back four JavaBeans and a String array,
to simulate some useful content, causing 2.2KB XML messages.

A side note - all these tests were run on so many OSes and CPUs using exactly the same binary, compiled only once. This
is a real-life demonstration of "Write Once, Run Everywhere" feature of Java.

webservice SOAP test

calls per second

stable cycles

Operating system

Java Virtual Machine implementation

CPU

387

120000

Linux 2.6.5 64-bit

BEA JRockit 1.4.2_04 32-bit

1x AMD 64 FX-53 2.4GHz

322

4800

Linux 2.6.5 64-bit

IBM 1.4.2SR1 32-bit

1x AMD 64 FX-53 2.4GHz

314

6200

Linux 2.6.5 64-bit

IBM 1.4.1SR2 32-bit

1x AMD 64 FX-53 2.4GHz

220

2200

Linux 2.6.5 64-bit

IBM 1.4.2 64-bit

1x AMD 64 FX-53 2.4GHz

334

80000

Linux 2.6.5 64-bit

BEA JRockit 1.4.2_04 32-bit

2x AMD Opteron 244 1.8GHz

243

3600

Linux 2.4.21 64-bit

IBM 1.4.1SR1 32-bit

2x AMD Opteron 244 1.8GHz

195

6800

Linux 2.4.21 64-bit

SUN 1.5.0-b 32-bit -server -XX:CompileThreshold=1500

2x AMD Opteron 244 1.8GHz

190

7600

Linux 2.4.21 64-bit

Blackdown 1.4.2 32-bit -server -XX:CompileThreshold=1500

2x AMD Opteron 244 1.8GHz

183

7200

Linux 2.4.21 64-bit

Blackdown 1.4.2 64-bit -server -XX:CompileThreshold=1500

2x AMD Opteron 244 1.8GHz

180

10000

Linux 2.4.21 64-bit

SUN 1.5.0-b 32-bit -client

2x AMD Opteron 244 1.8GHz

165

3600

Linux 2.4.21 64-bit

SUN 1.4.2_03 32-bit -server -XX:CompileThreshold=1500

2x AMD Opteron 244 1.8GHz

160

4500

Linux 2.4.21 64-bit

Blackdown 1.4.2 32-bit -client

2x AMD Opteron 244 1.8GHz

150

13000

Linux 2.4.21 64-bit

SUN 1.5.0-b 32-bit -server

2x AMD Opteron 244 1.8GHz

150

9000

Linux 2.4.21 64-bit

Blackdown 1.4.2 32-bit -server

2x AMD Opteron 244 1.8GHz

140

9000

Linux 2.4.21 64-bit

Blackdown 1.4.2 64-bit -server

2x AMD Opteron 244 1.8GHz

320

95000

Linux 2.6.5 64-bit

BEA JRockit 1.4.2_04 32-bit

1x AMD Opteron 148 2.2GHz

266

35000

Linux 2.6.5 64-bit

IBM 1.4.1SR2 32-bit

1x AMD Opteron 148 2.2GHz

258

2600

Linux 2.6.5 64-bit

IBM 1.4.2 32-bit

1x AMD Opteron 148 2.2GHz

250

4800

Linux 2.6.5 64-bit

Blackdown 1.4.2 64-bit -server -XX:CompileThreshold=1500

1x AMD Opteron 148 2.2GHz

238

7200

Linux 2.6.5 64-bit

SUN 1.4.2_03 32-bit -server -XX:CompileThreshold=1500

1x AMD Opteron 148 2.2GHz

NullPointerException in SAXParserImpl

N/A

Linux 2.6.5 64-bit

SUN 1.5.0-rc-b63 for AMD64

1x AMD Opteron 148 2.2GHz

NullPointerException in SAXParserImpl

N/A

Linux 2.6.5 64-bit

SUN 1.5.0-rc-b63 for IA32

1x AMD Opteron 148 2.2GHz

311

99000

Linux 2.6.9 ia64

BEA JRockit 1.4.2_04 64-bit

2x Itanium2 1.4GHz

139

3600

Linux 2.6.9 ia64

SUN 1.4.2_06 64-bit -XX:CompileThreshold=1500

2x Itanium2 1.4GHz

115

10400

Linux 2.6.9 ia64

SUN 1.4.2_06 64-bit

2x Itanium2 1.4GHz

96

15000

Linux 2.6.9 ia64

SUN 1.4.2_06 64-bit -Xcomp

2x Itanium2 1.4GHz

279

75000

Linux 2.6.5

BEA JRockit 1.4.2_04

1x Pentium4 3GHz (w/o HT, 800FSB, dualch.mem)

220

3600

Linux 2.6.8

IBM 1.4.2

1x Pentium4 3GHz (w/o HT)

213

2600

WinXP

IBM 1.4.2

1x Pentium4 3GHz (w/o HT)

213

9800

WinXP

SUN 1.4.2_03 -server -XX:CompileThreshold=1500

1x Pentium4 3GHz (w/o HT)

192

4000

Linux 2.6.8

IBM 1.4.2

1x Pentium4 3GHz HT

244

83000

Linux 2.4.25 ia64

BEA JRockit 1.4.2_04 64-bit

2x Itanium2 1GHz

140

400

Linux 2.4.19 ia64

BEA JRockit 1.4.2_03 64-bit

2x Itanium2 1GHz

80

5000

Linux 2.4.19 ia64

SUN 1.4.2_03 64-bit -server

2x Itanium2 1GHz

239

95000

Linux 2.4.27

BEA JRockit 1.4.2_04

1x Pentium4 2.5GHz

198

4000

Linux 2.4.23

SUN 1.5.0-b -server -XX:CompileThreshold=1500

1x Pentium4 2.5GHz

187

2400

Linux 2.4.23

IBM 1.4.1SR1

1x Pentium4 2.5GHz

182

5000

Linux 2.4.23

SUN 1.4.2_03 -server -XX:CompileThreshold=1500

1x Pentium4 2.5GHz

166

3000

Linux 2.4.23

SUN 1.5.0-b -client

1x Pentium4 2.5GHz

163

400

Linux 2.4.23

BEA JRockit 8.1sp2-1.4.1_05

1x Pentium4 2.5GHz

156

9000

Linux 2.4.23

SUN 1.5.0-b -server

1x Pentium4 2.5GHz

152

400

Linux 2.4.23

BEA JRockit80 1.4.1_01

1x Pentium4 2.5GHz

150

9000

Linux 2.4.23

SUN 1.4.2_03 -server

1x Pentium4 2.5GHz

150

2500

Linux 2.4.23

SUN 1.4.2_03 -client

1x Pentium4 2.5GHz

150

2500

Linux 2.4.23

Blackdown 1.4.2-rc1 -client

1x Pentium4 2.5GHz

145

9000

Linux 2.4.23

Blackdown 1.4.2-rc1 -server

1x Pentium4 2.5GHz

116

400

Linux 2.4.23

BEA JRockit 1.4.2_03

1x Pentium4 2.5GHz

232

100000

Linux 2.4.27

BEA JRockit 1.4.2_04

2x Xeon P4 3.06GHz HT

215

3000

Linux 2.4.23

IBM 1.4.1SR1

2x Xeon P4 3.06GHz HT

210

4600

Linux 2.4.23

SUN 1.5.0-b -server -XX:CompileThreshold=1500

2x Xeon P4 3.06GHz HT

180

4400

Linux 2.4.23

SUN 1.4.2_03 -server -XX:CompileThreshold=1500

2x Xeon P4 3.06GHz HT

175

3000

Linux 2.4.23

SUN 1.5.0-b -client

2x Xeon P4 3.06GHz HT

160

7500

Linux 2.4.23

SUN 1.5.0-b -server

2x Xeon P4 3.06GHz HT

145

9000

Linux 2.4.23

SUN 1.4.2_03 -server

2x Xeon P4 3.06GHz HT

140

2500

Linux 2.4.23

SUN 1.4.2_03 -client

2x Xeon P4 3.06GHz HT

152

2800

MacOSX 0.3.2

Apple 1.4.2-34 -server

2x PowerPC 970 2GHz

141

1800

MacOSX 0.3.2

Apple 1.4.2-34 -client

2x PowerPC 970 2GHz

150

1500

AIX 5.2

IBM 1.4.1 32-bit

2x Power4+ 1.2GHz

115

1500

AIX 5.2

IBM 1.4.0 64-bit

2x Power4+ 1.2GHz

95

1200

OSF1 alpha V5.1

Compaq 1.4.0-1 Fast VM

4x Alpha 667MHz

79

11000

SunOS 5.8

SUN 1.4.2_03-b02 -server -XX:CompileThreshold=1500

2x UltraSPARC-III 750MHz

64

6000

SunOS 5.8

SUN 1.4.2_03-b02 -client

2x UltraSPARC-III 750MHz

13

400

IRIX64 6.5

SGI 1.4.1 -classic

20x MIPS R14000 500 Mhz

Java virtual machine has big overhead when starting, because the HotSpot or Just-In-Time compilers
need to analyze and compile bytecode. The number of calls which were performed before the speed stabilized
is in the "stable cycles" column of the result table.

This test has one deceiving feature - at the first look, one would expect multiprocesors to have an advantage,
because client and server are separate processes. However, the client and the server are roundtriping
a set of data, it means that client send a request and waits, server receives the request, generates
a response and waits, and so on. Thus only one of the two processes is running at any time,
so effectively it is a single thread test. That explains why multiprocessors are not twice faster than
uniprocessors.

A very interesting thing is the influence of option-XX:CompileThreshold=1500
to SUN HotSpot JVM. The default is 10000 for Server VM and 1500 for Client VM. However
setting it to 1500 for Server VM makes it faster than Client VM. Setting it to 100 actualy
lowers the performance. And using option -Xcomp (which means that
all code is compiled before usage) gives even lower performance, which is surprising.

The BEA JRockit JVM has a very special behavior in several ways. First, its
results are fluctuating widely over time. While other JVMs get to a maximum
speed after relatively short number of calls (less than ten thousand) and then
the measured times after each 200 calls give stable results, with JRockit this
would give results varying as much as 50% for each 200 calls. So I had to use
batches of 5000 calls to get stable average of calls per second, but even then
it is varying like 10%. Second speciality is that it starts slowly, and it
takes very high number of calls, like 100000, to get to its maximum speed. And
the third speciality is that it is the only JVM, which runs this
single-threaded test faster on two-CPU Opteron 1.8GHz machine than on one-CPU
Opteron 2.2GHz. My guess is that this is caused by JRockit's parallel garbage
collector, which can take advantage of the second CPU.

There is another implementation of SOAP in the C language called
gSOAP. We can compare equivalent
clients and servers against each other. However the comparison is not fair to Java,
because Axis is able to create any dynamic SOAP calls, while gSOAP is a preprocessor
generating one-purpose code, which can thus be much faster.
The graph is scaled down 8 times now.

If you want to run you own tests, download source code,
compile for your platform and run. Please send me your results if you
have a platform not listed here.