Today, core Java APIs lack high quality hash functions, and 3rd party implementations provide sub-optimal performance. As non-cryptographic hash function are important building blocks of software, this is a major bummer for developers.

Generally, the selection of available hash functions is plenty, and in the last decade, many new hash functions emerged with very good hashing properties. Surprisingly, the core Java API just still offers Adler32 and CRC32, which were designed as checksums many years ago. Of course, there are many hash implementations available outside of the core Java API. However, unlike in the C world, there are just a few comparisons available. The hashing algorithms have very different performance characteristics, when they run inside of a Java VM. Today’s fastest hashes are highly optimized against CPU hardware, and can perform at several GB/s. The VM layer imposed by Java can get in the way here. Also implementation details matter greatly. For example, Murmur3A can outperform CRC32 by a magnitude when implemented in C. Nevertheless, the same Murmur3A implemented in Java can be several times slower than Java’s CRC32 class. These were the results when evaluating Murmur3A from Guava, which is one of the most popular and respected Java libraries available.

Adler32 and CRC32 provide 32 bit hashes. This is unfortunate because 64 bit hashes are a perfect match for today’s 64 bit CPUs and provide much(!) less collisions than their 32 bit counterparts. In contrast to cryptographic hash functions, they are much faster to compute and usually produce smaller hashes that are easy to handle.