Java's Object Methods: hashCode()

Introduction

This article is a continuation of a series of articles describing the often forgotten about methods of the Java language's base Object class. The following are the methods of the base Java Object which are present in all Java objects due to the implicit inheritance of Object.

The focus of this article is the hashCode() method which is used to generate a numerical representation of the contents of an object and is used heavily in the collections framework.

Why the hasCode() Method is Important

The purpose of the hashCode() method is to provide a numeric representation of an object's contents so as to provide an alternate mechanism to loosely identify it.

By default the hashCode() returns an integer that represents the internal memory address of the object. Where this comes in handy is in the creation and use of an important computer science data structure called a hash table. Hash tables map keys, which are values that result from a hash function (aka, hashCode() method), to a value of interest (i.e., the object the hashCode() method was executed on). This becomes a very useful feature when dealing with moderate-to-large collections of items, because it is usually a lot faster to compute a hash value compared to linearly searching a collection, or having to resize and copy items in an array backing a collection when it's limit is reached.

The driving feature behind an efficient hash table is the ability to create a hash that is adequately unique for each object. Buried in that last sentence is the reason why I emphasized the need to override both equals(Object) and hashCode() in the prior article.

If an object has implementation characteristics that require it to be logically distinct from others based on its content then it needs to produce as distinct a hash as reasonably possible. So two objects that are logically equivalent should produce the same hash, but it is sometimes unavoidable to have two logically different objects that may produce the same hash which is known as a collision. When collisions happen the colliding objects are placed in a metaphorical bucket and a secondary algorithm is used to differentiate them within their hash bucket.

Demonstrating Hash Table Usage

In Java the concept of a hash table is conceptualized in the java.util.Map interface and implemented in the java.util.HashMap class.

We'll demonstrate a hash table and why it is important to have a reasonably unique hash value computed by hashCode() when a class implementation warrants the notion of logical equality consider the following class and program.

As you can see from the output the default hash of me and me2 are not equal even though the custom implementation of equals(Object) indicates that they are logically the same. This results in two distinct entries in the hash table even though you would expect only one, which opens the doors to some nasty bugs in a program if it were to implement this code.

Let me improve the Person class by ensuring that the hashCode() method returns the same value for the equal instance objects me and me2, like so:

Ok, so now I have equal hash values for equal objects, but it is also clear that non-equal objects will also always have the same hash values.

First I will explain what is happening as the equal objects me and me2 are added to the HashMap. When the me2Person instance is added to the HashMap already containing the me instance, the HashMap notices that the hash is the same and then it determines that they are also logically equivalent via the equals(Object) method. This results in the HashMap simply replacing the first me with the second me2 at that location in the hash table.

Next comes the you instance, which again has the same hash value, but this time the HashMap identifies that it is logically different from the existing hash in that bucket me2. This leads to the HashMap adding the you instance to the bucket, turning that bucket into a list-like collection. For small numbers of collisions this doesn't have too great an impact, but in my example above, where every instance is guaranteed to have the same hash value, the bucket representing 31 in the HashMap will rapidly degrade to a poor implementation of a list for the entire HashMap.

At this point in time I would like to further demonstrate the ineffectiveness of this solution with concrete data to compare against the final implementation that will follow.

Below is a program that builds two equally sized collections, peopleList and peopleMap, of Person instances with equally sized random names and birthdays selected. I will measure the amount of time it takes to build the collections for a first comparison measurement. Next I will measure the amount of time it takes to search each collection for the existence of an equally placed known instance, me.

Wow that is grossly inefficient! This great hash table implementation in HashMap has been completely degraded to a terrible implementation of a list-like structure. Even worse is that arguably one of the primary reasons for using a hash table is to have rapid O(1) searching and retrieval of values via key access, but as you can see that is actually performing worse than searching a standard list linearly due my implementation of a hashCode() that has no differentiating capability. Yikes!

Let me fix this. There are a few ways that I know of to approach implementing a reasonably functioning hashCode() method and I will explain them below.

i) comupte the hash of the first deterministic class field used in the implementation of equals(Object) and assign that to a variable I'll call result.
ii) for each remaining deterministic field used the equals(Object) implementation multiply result by 31 and add the hash value of the deterministic field.

Pretty shocking right!? The HashMap itself is built in nearly half the time, plus the time required to find the me object is on an entirely different level of magnitude.

B. Using Objects.hash(...)

If you are looking for a simpler way to implement a custom hash value and are not extremely averse to not having the most performant implementation then its a good idea to reach for the Objects.hash(...) utility and pass it the deterministic fields of your object. This is a generally well performing method, and if you are like me and favor being able to quickly ship code rather than prematurely optimizing for performance, this is a great route to solving this problem.

As you can see it is essentially identical to the hand rolled implementation.

C. Autogeneration with IDE

My preferred method to implementing both the equals(Object) and hashCode() methods are actually to use the autogeneration functionality in my Java IDE of choice Eclipse. The implementation that Eclipse provides is shown below.

Conclusion

In this article I have, to the best of my ability, explained the importance of co-implementating the hashCode() method along with equals(Object) in order to efficiently work with data structures that apply the notion of a hash table. In addition to explaining why it is important to implement the hashCode() method I also demonstrated how to implement a few reasonably performant and robust hashing algorithms.

As always, thanks for reading and don't be shy about commenting or critiquing below.