I’ve been wanting to write up a juicy post on how we deal with very large heaps in Java to reduce GC pauses. Unfortunately I keep getting side tracked getting the data together. The latest bump in the road is due to a JVM bug of sorts.

Backstory: Todd Lipcon’s twitter post pointed me to the JVM option -XX:PrintFLSStatistics=1 to be able to get out some good information about heap fragmentation. He was even kind enough to provide the Python and R scripts! I figured that it would be a few minutes of fiddling and I’d have some good data for a post. No such luck. Our JVM GC/heap options are -XX:+UseConcMarkSweepGC -Xms65g -Xmx65g. When -XX:PrintFLSStatistics=1 is used with this, the following output is seen:

in hotspot/src/share/vm/gc_implementation/concurrentMarkSweep/binaryTreeDictionary.cpp. (“%d” just doesn’t cut it with a “long”‘s worth of data.) I filed a hotspot bug so hopefully it will be fixed in some release in the not-too-distant-future.

I can work around this but it has slowed down my getting to the juicy blog post. Stay tuned!

When I was writing my post yesterday I was reflecting on how much time we were spending making our code int friendly — specifically, dealing with the problems when you’re working with values around Integer.MAX_VALUE or 2147483647 (231-1). I likened it to i18n (internationalization) or l10n (localization). Much in that same vein, I’d like to coin the term “2eight7” to represent the problems one runs into when working with a signed integer and any code that depends (implicitly or explicitly) on it.

In most programming languages an int is 32 bits wide providing for 4294967295 (232-1) values or 2147483647 (231-1) if signed. In the case of Java, which we use for a number of components in our infrastructure, many of the fundamental components use int‘s: array indexes, NIO, IO, most collections (as they are commonly based on arrays), etc. When you’re working with billions of anything, its easy to run into these bounds which result in subtle bugs that are hard to track down due to exceptions that aren’t what they seem. The most common cases that we run into are due to the roll-over that occurs when you add any positive value to 2147483647 — the value becomes negative (since Java’s int‘s are signed). Sometimes this will result in an ArrayIndexOutOfBounds exception or sometimes it will result in a seemingly impossible callpath from deep inside of some java.* class.

I remember working on my first few i18N (internationalization) and l10n (localization) projects where I learned the do’s and don’ts of how to write code that worked seamlessly (or at least was easy to work with) in multiple locales. Working with “big data” feels exactly like that — you have to slowly build up a set of techniques: instead of a single array, you need to keep around arrays of arrays (since each dimension is limited to 2147483647 elements); you have to know how to shard your collections so that they do not exceed the maximum allowed capacity (e.g. HashMap is limited to 1073741824 (230) buckets); if(value > Integer.MAX_VALUE) doesn’t do what you think it does (and most of the time it’s hard to tell that that’s the code that you wrote). The list goes on.

One interesting development was “announced” at EclipseCon: there’s talk about “big data” support in Java 9 (ref Reinhold Already Talking About Java 9 for example). This is something that I will keep my eye on. Unfortunately, it wont help us for the next few years.