Escaping the JVM heap for memory intensive applications

If you’ve ever allocated large Java heaps, you know that at some point – typically starting at around 4 GiB – you will start having issues with your garbage collection pauses.

I won’t go into detail about why pauses happen in the JVM, but in short it happens when the JVM does full collections and you have a large heap. As the heap increases, those collections might become longer.

The simplest way to overcome this is to tune your JVM garbage collection parameters to match the memory allocation and deallocation behaviour of your particular application. It is a bit of a dark art and requires careful measurements, but it’s possible to have very large heaps while avoiding mostly old generation garbage collections. If you want to learn more about Garbage Collection tuning, check out this JVM GC tuning guide. If you get really interested about GC in general, this is an excellent book: The Garbage Collection Handbook.

There are JVM implementations that guarantee much lower pause times than the Sun VM, such as the Zing JVM – but normally at other costs in your system, such as increased memory usage and single threaded performance. The ease of configuration and low gc guarantees is still very appealing. For the purpose of this article, I will use the example of an in-memory cache or store in Java, mainly because I’ve built a couple in the past while using some of these techniques.

This implementation is straighforward. However, as the map size increases, we will be allocating a large number of objects (and deallocating), we are using boxed primitives which takes more space in memory that primitives and the map needs to be resized from time to time. We could certainly improve this implementation simply by using a primitive-based map. It would use less memory and objects but would still take space in the heap and possibly partition the heap, leading to longer pauses if for other reasons we do full GCs.

Let’s look at other ways to store similar data without using the heap:

Use a separate process to store the data. Could be something like a Redis or Memcached instance that you connect through sockets or unix sockets. It’s fairly straightforward to implement.

Offload data to disk, using memory mapped files. The OS is your friend and will do a lot of heavy work predicting what you’ll read next from the file and your interface to it is just like a big blob of data.

Use native code and access it through JNI or JNA. You’ll get better performance with JNI and ease of use with JNA. Requires you to write native code.

I will focus on the solutions that use exclusively Java for this article, direct allocated buffers and the Unsafe class.

Direct Allocated Buffers

Direct Allocated Buffers are extremely useful and used extensively when developing high-performance network applications in Java NIO. By allocating data directly outside the heap, in a number of cases you can write software where that data actually never touches the heap.

After creating a new buffer, you can manipulate the buffer in a few different ways. If you’ve never used Java NIO buffers you should definitely take a look as they are really cool.

Besides ways to fill, drain and mark different points in the buffer, you can opt to have different view on the buffer instead of a ByteBuffer – e.g. buffer.asLongBuffer() gives you a view on the ByteBuffer where you manipulate elements as longs.

So how could these be used in our Cache example? There are a number of ways, the most straightforward way would be to store the serialized/externalized form of the value record in a big array along with a map of keys to offsets and sizes of the record in that array.

It could look like this (very liberal approach, missing implementations and assuming fixed size records):

As you can see, this code has a number of limitations: fixed record size, fixed backing map size, limited way in which externalization is done, difficult to delete and reuse space, etc. While some of these are possible to overcome with clever ways to represent the record in byte arrays (and representing the keyToOffset map in direct allocated buffers also) or dealing with deletions (we could implement our own SLAB allocator) others such as resizing the backing map are difficult to overcome. An interesting improvement is to implement records as offsets to records and fields, thus reducing the amount of data we copy and do so only on demand.

Be aware that the JVM imposes a limit to the amount of memory used by direct allocated buffers. You can tune this with the -XX:MaxDirectMemorySize option. Check out the ByteBuffer javadocs

Unsafe

Another way to manage memory directly from Java is using the hidden Unsafe class. Technically we’re not supposed to use this and it is implementation specific as it lives in a sun package, but the possibilities offered are endless. What Unsafe gives us is the ability to allocate, deallocate and manage memory directly from Java code. We can also get the actual pointers and pass them between native and java code interchangebly.

There’s a lot of space for improvement and you need to do a number of things manually, but it’s very powerful. You can also explicitly free and reallocate memory that you’ve allocated in this way, which allows you to write some code in the same way you would to with C.

Conclusion

There’s a number of ways to avoid using the heap in Java and in this way, use a lot more memory. You don’t need to do this and I’ve personally seen properly tuned JVMs with 20GiB-30GiB running with no long garbage collection pauses, but it is fairly interesting.

If you want to check out how some projects use this for the basic (and honestly untested, almost written on a napkin) cache code I wrote here, have a look at EHCache’s BigMemory or Apache Cassandra which uses Unsafe also for this type of approach.

Newsletter

Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

Email address:

Join Us

With 1,043,221 monthly unique visitors and over 500 authors we are placed among the top Java related sites around. Constantly being on the lookout for partners; we encourage you to join us. So If you have a blog with unique and interesting content then you should check out our JCG partners program. You can also be a guest writer for Java Code Geeks and hone your writing skills!