Oracle Blog

Blog for dagastine

Update: Java 6 Leads Out of Box Server Performance

Based on feedback from my esteemed colleagues and readers I've updated this entry with a table of results. The tables are at the bottom after the charts.

Java 6 is finally here. Its our fastest, most reliable release and specifically targets out-of-box performance. What does this mean? Simply put it means no tuning options are needed for the JVM to achieve optimal performance. Looking at the bigger picture it means much more. No longer will you spend hours pouring over cryptic JVM tuning parameters to determine the optimal configuration for your application. No more expensive re-qualification of your application for special command line tuning. Java 6 makes performance tuning easy.

On SPECjbb2005 the numbers are impressive. Java 6 out of the box is more than 40% ahead of the competition on Intel Core, and 30% ahead on AMD Opteron.

On Scimark Java 6 continues to show solid performance leading the performance of the competition by more than 40%.

On Volano, Java 6 improves performance by more than 20% over the most recent update of the JDK 5.

The out of box performance of a Java application is an intriguing and difficultengineering problem. The requirements of client and server applications couldn't be more different. On one hand client apps want fast startup and low footprint, on the other hand server applications want highly optimized code, throughput and low pause times; while both want reliability and compatibility.

Out of box performance is the right goal for JVM development, and future Java benchmarks should reflect that goal. Delivering optimizations quickly to allow high benchmark results is fun but it doesn't help customers unless they become part on the default runtime behavior of the JVM.

Just to be clear and to reiterate once again, the intention of the data charts below is to highlight the importance of customer experience and out-of-box performance to Sun Java Engineering. These are not meant to be high performance benchmark results. Hand tuning can change the results significantly.

The following is an out-of-box performance comparison on a Dell 2950 and a Sun Fire X4200. The Dell system is configured with 2 dual-core Intel 5160 processors (2 CPUs, 4 cores @ 3.0Ghz) and 16GB of RAM. The Sun system is configured with 2 dual-core Opteron 280 processors (2 CPUs, 4 cores, 2.4 Ghz) and 8GB of RAM. The Operating System installed on both systems is Red Hat EL 4.0 AS Update 4. The kernel version is unmodified from the base install, which is 2.6.9-42.ELsmp. The only variable in this configuration is the JVM.

The JVM distributions and versions tested were the latest versions publicly available at the time of testing. The BEA JRockit JVMs tested are downloaded from their main GA website and their 64-bit performance update website. The IBM JVM is the latest available on the IBM developer website.

As stated above and in the title no JVM tuning options were used for these results. The results below are statistical comparisons. No less than 10 samples were performed, and a T-test (single-tailed) was used to ensure confidence in the result. The data is normalized to the 32-bit IBM JDK 5 SR3 result.

The first set of charts reflect performance on Intel's latest Core 2 micro-architecture. The results below, particularly the SPECjbb2005 results, strongly highlight a core difference in philosophy between Sun HotSpot and its competitors. If you look at highly tuned competitive submissions of our competitors, BEA JRockit in particular, have impressive numbers on the new chip. Our competitors have chosen to quickly deliver platform specific performance optimizations for the purpose of benchmark submissions but require the use of several tuning parameters to achieve that level of performance. Unfortunately this is quite misleading for customers. Yes, the benchmark numbers are good, but can a customer jump right in and use these features? If they were thoroughly tested and ready for prime time shouldn't they be enabled by default on the platforms that require them? We think so, and we have chosen differently, and thats the difference with HotSpot.

The first chart is SPECjbb2005. SPECjbb2005 is SPEC's benchmark for evaluating the performance of server side Java. It evaluates server side Java by emulating a three-tier client/server system (with emphasis on the middle tier). It extensively stresses Java collections, BigDecimal, and XML processing. The cool thing about SPECjbb2005 is that optimizations targeted for it also show performance gains in other competitive benchmarks, such as SPECjappserver2004, and a broad range of customer workloads. The benchmark results below are run in single instance mode.

SciMark 2.0 is a Java benchmark for scientific and numerical computing and is a benchmark where Sun's JVMs have continued to shine. Its a decent test of generated code, particularly for tight computational loops. However it is particularly sensitive to alignment issues and can show some level of variance from run to run, mostly in a bimodal fashion. The test has three modes of exectution; small, large, and default. This is the size of the data under test, more details can be found at the scimark website. All in all its a good set of microbenchmarks.

Note that the 32-bit JVMs in all cases are faster than the 64-bit JVMs when running on the Intel Core system. This is quite different than the AMD Opteron system further down the page where 64-bit is significantly faster. Since the Scimark 2.0 test is using the large dataset, its likely that the added pressure of 64-bit pointers on the memory subsystem increases bandwidth enough to impede performance, however this is just a hypothesis.

Volano is a popular Java chat server. The benchmark is quick and involves both a client and server instance. From a JVM perspective the workload is heavily dominated by classic Java socket I/O which is a bit long in the tooth, an NIO version would be quite interesting. That being said, some customers have found this benchmark quite useful so we continue to test it, however it is by no means our favorite benchmark as my friends at BEA have suggested. Running Volano the performance gaps are not as large, most likely because this benchmark has very little garbage collection overhead. BEA JRockit is showing good performance here with a result thats 19% over the baseline. Sun Java SE 6 shines as well with a result thats nearly 22% over baseline.

The second set of charts are run on a Sun Fire X4200 with AMD Opteron 280 CPUs. This is the identical system used in my previous blog articles on this subject, this time with updated JVM releases from Sun and IBM. I'm sure someone will be curious why I didn't compare the Intel and AMD based systems directly. The primary reason is simple, I'm writing about JVM performance, not CPU performance. That being said, I didn't have the latest AMD CPUs readily available. In short, Intel is faster when running some of these benchmarks, while AMD is faster on others. In general the memory subsystem differences between these platforms is prevalent when comparing the performance of Java benchmarks. Sun Java 6 is showing impressive results running SPECjbb2005 with a result 30% over baseline and ~15% faster than J2SE 5.0_10.

Scimark 2.0 is impressive on AMD Opteron as well. The large dataset is an interesting workload as its effect on cache can highlight memory subsystem limitations. If your application crunches on a large dataset, take a look at the large dataset of Scimark when comparing JVMs and system architectures.

Last but not least is Volano on AMD Opteron (and again, no this is not our favorite benchmark!). Java 6 shows a strong improvement of with results more than 20% greater than 5.0_10, pulling ahead of 64-bit BEA JRockit. Nice.

Interesting article, but I'm not sure how useful in terms of server benchmarks. Nobody who develops server software expects their customers to run it "out of the box" on the JVM. In fact, anywhere I've worked doing so was simply not possible, as the server is started either via a service wrapper (on Windows) or a script (on Linux/Solaris) which has the opportunity to set all relevant options.
Running an enterprise class Java server on a large memory, multi-core, multi-processor machine AND expecting to squeeze top performance out of it is something that requires many hours in front of profilers and reading through dtrace output to achieve. But I'm not talking about eeking out a few percentage points here. Depending on the application, a properly tuned generational garbage collector can double the performance of the application, or better.
Still, interesting article, and agreed that increasing "out the box" performance is a Good Thing. It will probably only benefit "standard" applications, but those do exist.