32-bit or 64-bit JVM? How about a Hybrid?
Tuesday, Oct 14 2008

Before x86-64 came along, the decision on whether to use 32-bit or 64-bit mode for architectures that supported both was relatively simple: use 64-bit mode if the application requires the larger address space, 32-bit mode otherwise. After all, no point in reducing the amount of data that fits into the processor cache while increasing memory usage and bandwidth if the application doesn’t need the extra addressing space.

When it comes to x86-64, however, there’s also the fact that the number of named general-purpose registers has doubled from 8 to 16 in 64-bit mode. For CPU intensive apps, this may mean performance at the cost of extra memory usage. On the other hand, for memory intensive apps 32-bit mode might be better in if you manage to fit your application within the address space provided. Wouldn’t it be nice if there was a single JVM that would cover the common cases?

It turns out that the HotSpot engineers have been working on doing just that through a feature called Compressed oops. The benefits:

Heaps up to 32GB (instead of the theoretical 4GB in 32-bit that in practice is closer to 3GB)

The main disadvantage is that encoding and decoding is required to translate from/to native addresses. HotSpot tries to avoid these operations as much as possible and they are relatively cheap. The hope is that the extra registers give enough of a boost to offset the extra cost introduced by the encoding/decoding.

Compressed Oops have been included (but disabled by default) in the performance release JDK6u6p (requires you to fill a survey), so I decided to try it in an internal application and compare it with 64-bit mode and 32-bit mode.

The tested application has two phases, a single threaded one followed by a multi-threaded one. Both phases do a large amount of allocation so memory bandwidth is very important. All tests were done on a dual quad-core Xeon 5400 series with 10GB of RAM. I should note that a different JDK version had to be used for 32-bit mode (JDK6u10rc2) because there is no Linux x86 build of JDK6u6p. I chose the largest heap size that would allow the 32-bit JVM to run the benchmark to completion without crashing.

The performance difference and memory overhead are the same as with the smaller dataset. The benefit of Compressed Oops here is that we still have plenty of headroom while the 32-bit JVM is getting closer to its limits. This may not be apparent from the heap size after a full GC, but during the multi-threaded phase the peak memory usage is quite a bit larger and the fact that the allocation rate is high does not help. This becomes more obvious when we look at the results for the 64-bit JVM.

I had to increase the Xms/Xmx to 4224m for the application to run to completion. Even so, the performance of the multi-threaded phase took a substantial performance hit when compared to the other two JVM configurations. All in all, the 64-bit JVM without compressed oops does not do well here.

In conclusion, it seems that compressed oops is a feature with a lot of promise and it allows the 64-bit JVM to be competitive even in cases that favour the 32-bit JVM. It might be interesting to test applications with different characteristics to compare the results. It’s also worth mentioning that since this is a new feature, it’s possible that performance will improve further before it’s integrated into the normal JDK releases. As it is though, it already hits a sweet spot and if it weren’t for the potential for instability, I would be ready to ditch my 32-bit JVM.

Update: The early access release of JDK 6 Update 14 also contains this feature.Update 2: This feature is enabled by default since JDK 6 Update 23.

This is hardly new. JRockit has had this feature (we call it compressed references) for several years. IBM recently added it to their JVM, and the Apache Harmony implementation started with compressed references before they did a “normal” 64-bit JVM.

First of all, no-one claimed that HotSpot was the first to have some form of reference compression. In fact, the first trackback mentions that IBM and BEA JVMs support something similar.

The blog entry was mostly about testing what kind of effect the feature has on one particular real-life application and HotSpot was the natural choice since it’s open-source and the most widely used JVM.

Even though I was aware that BEA had some form of compression, I had not looked into the details. I decided to briefly check the information for all 3 JVMs you mentioned and I found that when compressed references are used (if the documentation is accurate):

– BEA only supports a 4GB heap[1].
– Harmony only supports a 4GB heap[2].
– IBM supports a 25GB heap[3], that’s closer to the Sun JVM one, but the feature is also relatively recent.

Maybe the Sun implementation is indeed something new, since it supports 32GB. That was actually one of the major advantages in my example.

The 4 vs 32 GB discussion is interesting. The goal with JRockit’s implementation is to make 64-bit JVMs as performant as 32-bit for small heap sizes, with the rationale that large heap sizes brings other issues that are more important to deal with (long GC pauses). We have known for a long time that 32 GB (and larger) heaps are possible with compressed references. The limit is actually 2^32 objects, not an artificial restriction on heap size. A 4 GB restriction gives the best performance since that makes the translation operation very cheap, 32 GB and above requires shifting the pointer which adds overhead, especially during GC when you basically have to dereference all objects (eg longer GC pauses). I believe this is discussed in old JavaOne presentations on JRockit, but I don’t have any links.

32 GB is possible through the observation that all objects are aligned such that the lower so many bits are always zero (carry no information). But you can equally well align objects on another boundary, expanding to 64/128 GB or larger. The problem is that you will waste memory.

At the end of the day, it all comes down to tradeoffs. Code complexity vs throughput vs GC pause times, and as always there is no one single “perfect” choice. And the actual implementation is much more important than the algorithm…

Yes, there is no single “perfect” choice and indeed it’s all about trade-offs. I just happen to think that the 32GB limit hits a sweet spot for many server apps at this point in time. It’s certainly annoying to have to endure a 60% memory size overhead just because you’re past the 4GB ceiling.

When it comes to the performance impact of shifting the pointer, you have actually just described one of the main points of this blog entry. ;) In other words, to check if it would have any measurable impact on an application with a high allocation rate (which obviously has GC implications). It turns out that it doesn’t so it seems to me that the HotSpot engineers did a good job. Of course, your mileage may vary, this is just a single data point.

And yes, 32GB is the limit to avoid potential holes between objects and the resulting wasted memory. Nikolay Igotti covered the possibility of larger heap sizes on his blog 1.5 years ago, so HotSpot would probably support it if there was enough demand.

I disagree that actual implementation is much more important than the algorithm though. They are both very important and the best implementation can’t save a bad algorithm and vice-versa (this is a generalisation and as such there might be exceptions, but the general point remains).

[…] not the only ones doing this trick, Oracle/BEA have the -XXcompressedRefs option and Sun has the -XX:+UseCompressedOops option. Of course, each of the vendors implementations are slightly different with different […]

I’ve been trying to process some large data sets using the weka data mining package (www.cs.waikato.ac.nz/ml/weka/). They were too big for Java on windows XP 32-bit so I moved to a windows XP 64-bit machine, but the memory consumption has been appalling — its using approximately 6 times the memory for the same amount of raw data! (On the 64 bit machine I can only process data sets one third the size the 32bit machine could handle despite having twice the addressable heap!)

I’ve tried using JDK 6 update 14 and compressed OOPS, but it didn’t seem to make any difference at all. I’d be grateful for any thoughts/advice. Is it a windows problem?

I’ve included the output from jconsole below — and yes it is picking up the arguments.
As you can see, its currently using >3GB heap to do this job, whereas a 32bit windows xp machine was handling data sets three times this size in 1.6GB!

I’m tempted to try switching to linux, but would prefer to avoid the hassle if its something stupid I’m doing

I’ve tried JDK 6.12, and both JDK 6 Update 14 and JDK6u6p with compressed OOPS and the all seemed to perform similarly

(BTW I’m using concurrent mark sweep GC; but the ParNew GC is still running. Is there any way of disabling the latter)

Is it possible that you were using the -client JIT while using 32-bit Windows? The -server JIT has a tendency to use all the memory you give it for tasks that allocate a lot. What happens if you increase the size of the data set? Does it OOM or does it process it fine?

Juma,
We are using solaris10 32 bit JVM on our production servers with Sun one app server. As we are constatnly getting out of memory – heap error, I am planning to increase the heap size to 4G for JVM options. Current heap size is 2G and we have 8G memory available.

I tried each mode: both 64 bit (usual and compressed) and 32 bit
with our application.
-Xmx12500m -server -XX:+UseCompressedOops
In compressed mode it takes 10 times more to start. Export to SPSS takes also 20 times worse, but memory consumption is much better and faster in compressed mode (up to 40%). As for me, it could be helpfull for test applications and platphorms, if it did not take so much time to compute

Hi Guys
I have a pure java + Apple Web Object 5.5 web application, right now application is on 32bit processor with 32bit JVM having 16gb ram but still client get “java.lang.OutOfMemoryError: Java heap space” so we are planning to move it on 64bit jvm and 64 bit processor and 32 gb ram. Here my question is will i need to recompile my application before it live on production server or there is no need of recompilation? ..one more thing i want to mention it’s a pure java application no native code. Thanks in advance waiting for your response.

If it’s a pure Java application, you don’t need to recompile it. Note that you also have to check whether your container is pure Java (in which case changing the JVM is enough) or requires some configuration to run in 64-bit mode.

“This feature is enabled by default since JDK 6 Update 23″.
Is this mean that we don’t need add -X:+UseConcMarkSweepGC after JDK 6 Update 23? And how can we check if the JDK enabled this default? Thanks!

Here is my problem i am working on a project which uses javaws(32 bit jre) and my application build on some wrappers C++. when i try to run it with 1024M it is working fine on 64bit machine windows7 but when i try to run it with 1536M. outof memory and very strange behavious observed like one program working fine and some time it even gives me null pointer exception on calling simple JFileChooser dialog etc. and then that application will never works unless restart my application or just hangs.

To clarify your last comment:
“This feature is enabled by default since JDK 6 Update 23″.
This means that as of jdk6u23, the option -XX:+UseCompressedOops is enabled automatically when relevent? Or just that this option is supported if specified as command line?