Should I use a 32- or a 64-bit JVM?

November 23, 2012 by
Vladimir Šor
Filed under:
Java

This is a question I have faced several times during my career in enterprise software development. Every once in awhile I’ve had to hand out recommendations for configuring a specific new environment. And more often than not, part of the question at hand was related to “Should I use a 32- or a 64-bit JVM”. To be honest, in the beginning I just flipped the coin. Instead of giving a reasoned answer. (Sorry, bros!) But by now I have gathered some more insight on this and thought to share it with you.

First stop – the more, the merrier. Right? So – as 64 > 32 then this would be an easy answer: if possible, always choose 64-bit? Well, hold your horses. The downside of the 64-bit architecture is that the same data structures consume more memory. A lot more. Our measurements show that depending on the JVM version and the operating system version along with hardware architecture you end up using 30-50% more heap than on 32-bit. Larger heap can also introduce longer GC pauses affecting application latency – running a full GC on a 4.5GB heap is definitely going to take longer than on a 3GB one. So it will not be correct to jump on the 64-bit bandwagon just because 64 is bigger than 32.

But… when should you ever desire to use a 64-bit JVM at all then? In most cases the reason is large heap sizes. On different architectures you quickly face limitations of maximum heap size on 32-bit architectures. The following illustrates these limitations on different platforms:

OS

Max

heapnotes

Linux

2GB

3GB on specific kernels, such as hugemem

Windows

1.5GB

Up to 3GB with “/3GB” boot flag and JRE compiled with /LARGEADDRESSAWARE switch)

Mac OS X

3.8GB

Alert – could not find an ancient Mac, so this is untested by me

Now how come is it that bad? After all, I bet you have seen 32-bit machines running on 16G+ RAM and doing just fine. What’s wrong with the JVM that it can allocate less than 10% of this 16G on Windows?

Main cause – address space. In a 32-bit system you can theoretically allocate up to 4GB of memory per process. What breaks this on Windows is how process address space is handled. Windows cuts the process address space in half. One half of it is reserved for the kernel (which a user process cannot use) and the other half for the user. It doesn’t matter how much RAM is in the box, a 32-bit process can only use 2GB of RAM. What’s even worse – this address space needs to be contiguous, so in practice you are most often left with just 1.5-1.8GB of heap on Windows boxes.

There is a trick you can pull on 32-bit windows to reduce the kernel space and grow the user space. You can use the /3GB parameter in your boot.ini. However, to actually use this opportunity, the JVM must be compiled/linked using the /LARGEADDRESSAWARE switch.

This unfortunately is not the case, at least with the Hotspot JVM. Until the latest JDK 1.7 releases the JVM is not compiled with this option. You are luckier if you are running on a jRockit on post-2006 versions. In this case you can enjoy up to 2.8-2.9GB of heap size.

So – can we conclude that if your application requires more than ~2-3GB of memory you should always run on 64-bit? Maybe. But you have to be aware of the threats as well. We have already introduced the culprits – increased heap consumption and longer GC pauses. Lets analyze the causes here.

Problem 1: 30-50% of more heap is required on 64-bit. Why so? Mainly because of the memory layout in 64-bit architecture. First of all – object headers are 12 bytes on 64-bit JVM. Secondly, object references can be either 4 bytes or 8 bytes, depending on JVM flags and the size of the heap. This definitely adds some overhead compared to the 8 bytes on headers on 32-bit and 4 bytes on references. You can also dig into one of our earlier posts for more information about calculating the memory consumption of an object.

Problem 2: Longer garbage collection pauses. Building up more heap means there is more work to be done by GC while cleaning it up from unused objects. What it means in real life is that you have to be extra cautious when building heaps larger than 12-16GB. Without fine tuning and measuring you can easily introduce full GC pauses spanning several minutes. In applications where latency is not crucial and you can optimize for throughput only this might be OK, but on most cases this might become a showstopper.

So what are my alternatives when I need larger heaps and do not wish to introduce the overhead caused by 64-bit architecture? There are several tricks we have covered in one of our earlier blog posts– you can get away by heap partitioning, GC tuning, building on different JVMs or allocating memory off the heap.

To conclude, let’s re-state that you should always be aware of the consequences of choosing a 64-bit JVM. But do not be afraid of this option.

And – if you enjoyed this post then stay tuned for more and subscribe to either our RSS feed or Twitter stream to be notified on time.

We are using 128GB Heap sizes in a hazelcast-based solution and even up to 80% fillup and using many millions of live object trees besides having 60GB of serialized in-heap data (bytearrays) we never experience longer pauses (except for explicit System.gc() which we sometimes do to find out how much space is really used). The current GC implementations (we use the latest 1.6 hotspot vms) avoid pauses quite efficiently…

I’m a bit skeptical about the idea that objects which take up more space in a 64 bit jvm take up more time to garbage collect. The amount of time to GC should be proportional to the number of objects that are GC’d, not their size.

Probably I didn’t word it clear enough, but the what I meant was that if you’ll take advantage of a much larger heap size and would fill, e.g., 12GB of heap with objects and once Full GC should kick in, it can take a long time. For the same amount of objects it sure won’t take longer, but then you don’t need a larger heap.

Something is wrong with your implication about GC latency. The objects are bigger, and the object references are bigger, it certainly means more memory per object is in use, but that does not imply that GC latency is longer. Given the same application and test case, there won’t be more objects allocated on 64bit. The garbage collector is visiting the same number of objects, so there is no added latency on that part of the cycle. Also, the same number of objects would be compacted, and the machine level ‘move’ operation operates on the architectures word size (32 v 64), so there would be equal amount of machine level ‘move’ operations, too. Where is the GC latency? Maybe you mean that GC will run more frequently if jvm’s heap is not increased on 64bit.

Note that running a 32 bit JVM on a 64 bit operating system also reduces the address space available for a process compared to running 32 bit JVM on a 32 bit OS therefore if you prefare 32 bit JVM make sure the OS is also 32 bit.

I also adviced to use 32 bit where possible because of this described overheads.nBut is it still this way? Comressed Oops should minimize this overhead, but I havn’t done/seen a benchmark yet.nhttp://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.htmln