Topics Covered in the Blog - Synopsis

Monday, February 21, 2011

Chapter 38: Overriding toString(), hashCode() and equals() Methods

We have looked a lot of features of the Java programming language in the past chapters. In the next chapter we are going to look at Collections. They are widely used and extremely useful in any java application. But, before we can dive into the collection framework, we need to understand a few features of the Object class in Java. We will be looking at the toString(), hashCode() and equals() methods in this chapter and this would be the foundation required for you to understand the subsequent chapters.

So, let’s get started!!!

The toString() Method

Override toString() when you want to be able to read something meaningful about the objects of your class. Code can call toString() on your object when it wants to read useful details about your object. When you pass an object reference to the System.out.println() method, the object’s toString() method is called by the JVM, and the output of toString() is printed in the console.
Ex:

Running the TestToStringMethod class gives us the following awesome output:

% java TestToStringMethod
TestToStringMethod@a47e0

The preceding output is what you get when you don’t override the toString() method of class Object. It gives you the class name followed by the @ symbol, followed by the unsigned hexadecimal representation of the object’s hashCode.

Probably now you know why overriding the toString() method is a good idea. Let’s look at another example:

This ought to be a bit more readable:
% java TestToStringAgain
I am a ChildClass, but you can call me GoChildClassGo. My age is 19

The toString() method is extremely useful and probably you guessed why you need it in your code. Yes, you are right, it is to help others and probably even you when you run your program at a later point in time to display the details of your object. Unless, you are ok with the ClassName@HashCode output we just saw a few lines ago.

Overriding equals() Method

We used the equals() method in the earlier chapter on Wrappers. We saw how comparing two object references using the == operator evaluates to true only when both references refer to the same object (because == simply looks at the bits in the variable, and they’re either identical or they’re not). You saw that the String class and the wrapper classes have overridden the equals() method, so that you could compare two different objects (of the same type) to see if their contents are meaningfully equivalent. If two different Integer instances both hold the int value 5, as far as you’re concerned they are equal. The fact that the value 5 lives in two separate objects doesn’t matter. (Even though both the variables have “5” inside them, the == operator would evaluate to false)

When you really need to know if two references are identical, use ==. But when you need to know if the objects themselves are equal (the contents of the object), use the equals() method. For each class you write, you must decide if it makes sense to consider two different instances equal. For some classes, you might decide that two objects can never be equal. For example, imagine a class Car that has instance variables for things like make, model, year, configuration—you certainly don’t want your car suddenly to be treated as the very same car as someone with a car that has identical attributes. Your car is your car and you don’t want your neighbor driving off in it just because, “hey, it’s really the same car; because, the equals() method said so.” So no two cars should ever be considered exactly equal. If two references refer to one car, then you know that both are talking about one car, not two cars that have the same attributes. So in the case of a Car you might not ever need, or want, to override the equals() method.

What it Means if You Don’t Override equals()

There’s a limitation here: if you don’t override a class’s equals() method, you won’t be able to use those objects as a key in a hashtable and you probably won’t get accurate Sets, such that there are no conceptual duplicates.

The equals() method in class Object uses only the == operator for comparisons, so unless you override equals(), two objects are considered equal only if the two references refer to the same object.

Let’s look at what it means to not be able to use an object as a hashtable key. Imagine you have a car, a very specific car (say, John’s red Ferrari F50 as opposed to Mary’s purple Mini) that you want to put in a HashMap, so that you can search on a particular car and retrieve the corresponding Person object that represents the owner. So you add the car instance as the key to the HashMap (along with a corresponding Person object as the value). But now what happens when you want to do a search? You want to say to the HashMap collection, “Here’s the car, now give me the Person object that goes with this car.” But now you’re in trouble unless you still have a reference to the exact object you used as the key when you added it to the Collection. In other words, you can’t make an identical Car object and use it for the search.

The bottom line is this: if you want objects of your class to be used as keys for a hashtable (or as elements in any data structure that uses equivalency for searching for—and/or retrieving—an object), then you must override equals() so that two different instances can be considered the same. So how would we fix the car? You might override the equals() method so that it compares the unique VIN (Vehicle Identification Number) as the basis of comparison. That way, you can use one instance when you add it to a Collection, and essentially re-create an identical instance when you want to do a search based on that object as the key. Of course, overriding the equals() method for Car also allows the potential that more than one object representing a single unique car can exist, which might not be safe in your design. Fortunately, the String and wrapper classes work well as keys in hashtables—they override the equals() method. So rather than using the actual car instance as the key into the car/owner pair, you could simply use a String that represents the unique identifier for the car. That way, you’ll never have more than one instance representing a specific car, but you can still use the car—or rather, one of the car’s attributes—as the search key.

Let’s look at this code in detail. In the main() method of TestEqualsMethod, we create two TestClass instances, passing the same value 8 to the TestClass constructor. Now look at the TestClass class and let’s see what it does with that constructor argument—it assigns the value to the TestClassValue instance variable. Now imagine that you’ve decided two TestClass objects are the same if their TestClassValue is identical. So you override the equals() method and compare the two TestClassValues. It is that simple. But let’s break down what’s happening in the equals() method:

First of all, you must observe all the rules of overriding, and in line 1 we are indeed declaring a valid override of the equals() method we inherited from Object.
Line 2 is where all the action is. Logically, we have to do two things in order to make a valid equality comparison.

First, be sure that the object being tested is of the correct type! It comes in polymorphically as type Object, so you need to do an instanceof test on it. Having two objects of different class types be considered equal is usually not a good idea, but that’s a design issue we won’t go into here. Besides, you’d still have to do the instanceof test just to be sure that you could cast the object argument to the correct type so that you can access its methods or variables in order to actually do the comparison. Remember, if the object doesn’t pass the instanceof test, then you’ll get a runtime ClassCastException. For example:

The (TestClass)o cast will fail if o doesn’t refer to something that IS-A TestClass.
Second, compare the attributes we care about (in this case, just TestClassValue). Only the developer can decide what makes two instances equal.

In case you were a little surprised by the whole ((TestClass)o).getTestClassValue() syntax, we’re simply casting the object reference, o, just-in-time as we try to call a method that’s in the TestClass class but not in Object. Remember, without the cast, you can’t compile because the compiler would see the object referenced by o as simply, well, an Object. And since the Object class doesn’t have a getTestClassValue() method, the compiler would squawk (technical term). But then as we said earlier, even with the cast, the code fails at runtime if the object referenced by o isn’t something that’s castable to a TestClass. So don’t ever forget to use the instanceof test first. Here’s another reason to appreciate the short circuit && operator—if the instanceof test fails, we’ll never get to the code that does the cast, so we’re always safe at runtime with the following:

If you look at the Object class in the Java API spec, you’ll find what we call a contract specified in the equals() method. A Java contract is a set of rules that should be followed, or rather must be followed if you want to provide a “correct” implementation as others will expect it to be. Or to put it another way, if you don’t follow the contract, your code may still compile and run, but your code (or someone else’s) may break at runtime in some unexpected way.

Exam Tip: Remember that the equals(), hashCode(), and toString() methods are all public. The following would not be a valid override of the equals() method, although it might appear to be if you don’t look closely enough during the exam:
class Class1 { boolean equals(Object o) { } }

And watch out for the argument types as well. The following method is an overload, but not an override of the equals() method:
class Class2 { public boolean equals(Class2 b) { } }

Be sure you’re very comfortable with the rules of overriding so that you can identify whether a method from Object is being overridden, overloaded, or illegally redeclared in a class. The equals() method in class Class2 changes the argument from Object to Class2, so it becomes an overloaded method and won’t be called unless it’s from your own code that knows about this new, different method that happens to also be named equals()

The equals() Contract

Pulled straight from the Java docs, the equals() contract says
• It is reflexive. For any reference value x, x.equals(x) should return true.
• It is symmetric. For any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
• It is transitive. For any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) must return true.
• It is consistent. For any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
• For any non-null reference value x, x.equals(null) should return false.
We haven’t looked at the hashCode() method, but equals() and hashCode() are bound together by a joint contract that specifies if two objects are considered equal using the equals() method, then they must have identical hashCode values. So to be truly safe, your rule of thumb should be, if you override equals(), override hashCode() as well.

Overriding hashCode() Method

Hashcodes are typically used to increase the performance of large collections of data. The hashCode value of an object is used by some collection classes (Don't worry about Collections, its gonna be the very next chapter). Although you can think of it as kind of an object ID number, it isn’t necessarily unique. Collections such as HashMap and HashSet use the hashCode value of an object to determine how the object should be stored in the collection, and the hashCode is used again to help locate the object in the collection. For the exam you do not need to understand the deep details of how the collection classes that use hashing are implemented, but you do need to know which collections use them. You must also be able to recognize an appropriate or correct implementation of hashCode(). This does not mean legal and does not even mean efficient. It’s perfectly legal to have a terribly inefficient hashCode method in your class, as long as it doesn’t violate the contract specified in the Object class documentation. So for the exam, if you’re asked to pick out an appropriate or correct use of hashCode, don’t mistake appropriate for legal or efficient.

Understanding Hashcodes

In order to understand what’s appropriate and correct, we have to look at how some of the collections use hashcodes.

Imagine a set of buckets lined up on the floor. Someone hands you a piece of paper with a name on it. You take the name and calculate an integer code from it by using A is 1, B is 2, and so on, and adding the numeric values of all the letters in the name together. A given name will always result in the same code.

Let’s look at an example:

Key

Hash Code Algorithm

Hash Code

Rocky

R(18) + o(15) + c(3) + k(11) + y(25)

72

Anand

A(1) + n(14) + a(1) + n(14) + d(4)

34

The above is a simple algorithm that just sums up the numeric position of the alphabets in the name and arrives at a number that being 72 for Rocky and 34 for Anand. So in the hashCode bucket, the value 34 will be saved as the key for Anand and similarly 72 will be the value for Rocky.

Now imagine that someone comes up and shows you a name and says, “Please retrieve the piece of paper that matches this name.” So you look at the name they show you, and run the same hashCode-generating algorithm. The hashCode tells you in which bucket you should look to find the name.
You might have noticed a little flaw in our system, though. Two different names might result in the same value. For example, the names Amy and May have the same letters, so the hashCode will be identical for both names. That’s acceptable, but it does mean that when someone asks you (the bucket-clerk) for the Amy piece of paper, you’ll still have to search through the target bucket reading each name until we find Amy rather than May. The hashCode tells you only which bucket to go into, but not how to locate the name once we’re in that bucket.

So for efficiency, your goal is to have the papers distributed as evenly as possible across all buckets. Ideally, you might have just one name per bucket so that when someone asked for a paper you could simply calculate the hashCode and just grab the one paper from the correct bucket (without having to go flipping through different papers in that bucket until you locate the exact one you’re looking for). The least efficient (but still functional) hashCode generator would return the same hashCode (say, 42) regardless of the name, so that all the papers landed in the same bucket while the others stood empty. The bucket-clerk would have to keep going to that one bucket and flipping painfully through each one of the names in the bucket until the right one was found. And if that’s how it works, they might as well not use the hashcodes at all but just go to the one big bucket and start from one end and look through each paper until they find the one they want.

This distributed-across-the-buckets example is similar to the way hashcodes are used in collections. When you put an object in a collection that uses hashcodes, the collection uses the hashCode of the object to decide in which bucket the object should land. Then when you want to fetch that object, you have to give the collection a reference to an object that the collection compares to the objects it holds in the collection. As long as the object you’re trying to search for has the same hashCode as the object you’re using for the search (the name you show to the person working the buckets), then the object will be found. But...and this is a Big But, imagine what would happen if, going back to our name example, you showed the bucket-worker a name and they calculated the code based on only half the letters in the name instead of all of them. They’d never find the name in the bucket because they wouldn’t be looking in the correct bucket!

Now can you see why if two objects are considered equal, their hashcodes must also be equal? Otherwise, you’d never be able to find the object since the default hashCode method in class Object virtually always comes up with a unique number for each object, even if the equals() method is overridden in such a way that two or more objects are considered equal. It doesn’t matter how equal the objects are if their hashcodes don’t reflect that. So one more time: If two objects are equal, their hashcodes must be equal as well.

Implementing hashCode()

What does a real hashCode algorithm look like? People get their PhDs on hashing algorithms, so from a computer science viewpoint, it’s beyond the scope of the exam. The part we care about here is the issue of whether you follow the contract. And to follow the contract, think about what you do in the equals() method. You compare attributes. Because that comparison almost always involves instance variable values (remember when we looked at two TestClass objects and considered them equal if their int TestClassValues were the same?). Your hashCode() implementation should use the same instance variables. Here’s an example:

This equals() method says two objects are equal if they have the same x value, so objects with the same x value will have to return identical hashcodes.

Exam Tip: A hashCode() that returns the same value for all instances whether they’re equal or not is still a legal—even appropriate— hashCode() method! For example,

public int hashCode() { return 1492; }

This does not violate the contract. Two objects with an x value of 8 will have the same hashcode. But then again, so will two unequal objects, one with an x value of 12 and the other a value of -920. This hashCode() method is horribly inefficient, remember, because it makes all objects land in the same bucket, but even so, the object can still be found as the collection cranks through the one and only bucket—using equals() —trying desperately to finally, painstakingly, locate the correct object. In other words, the hashcode was really no help at all in speeding up the search, even though improving search speed is hashcode’s intended purpose! Nonetheless, this one-hash-fits-all method would be considered appropriate and even correct because it doesn’t violate the contract. Once more, correct does not necessarily mean good.

Typically, you’ll see hashCode() methods that does some combination of ^-ing (XOR-ing) a class’s instance variables (in other words, processing their bits), along with perhaps multiplying them by a prime number. In any case, while the goal is to get a wide and random distribution of objects across buckets, the contract (and whether or not an object can be found) requires only that two equal objects have equal hashcodes. The exam does not expect you to rate the efficiency of a hashCode() method, but you must be able to recognize which ones will and will not work (work meaning “will cause the object to be found in the collection”).

Now that we know that two equal objects must have identical hashcodes, is the reverse true? Do two objects with identical hashcodes have to be considered equal? Think about it—you might have lots of objects land in the same bucket because their hashcodes are identical, but unless they also pass the equals() test, they won’t come up as a match in a search through the collection. This is exactly what you’d get with our very inefficient everybody-gets-the-same-hashCode method. It’s legal and correct, just slow.

So in order for an object to be located, the search object and the object in the collection must have both identical hashCode values and return true for the equals() method. So there’s just no way out of overriding both methods to be absolutely certain that your objects can be used in Collections that use hashing.

The hashCode() Contract

From the Java API documentation for class Object, the hashCode() contract is as follows:
• Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode() method must consistently return the same integer, provided no information used in equals() comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
• If two objects are equal according to the equals(Object) method, then calling the hashCode() method on each of the two objects must produce the same integer result.
• It is NOT required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode() method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

And what this means to is...

Condition

Required

Not Required (But Allowed)

x.equals(y) == true

x.hashCode() == y.hashCode()

-

x.hashCode() == y.hashCode()

-

x.equals(y) == true

x.equals(y) == false

No HashCode requirements

-

x.hashCode() != y.hashCode()

x.equals(y) == false

-

Let’s look at what else might cause a hashCode() method to fail. What happens if you include a transient variable in your hashCode() method? While that’s legal (compiler won’t complain), under some circumstances an object you put in a collection won’t be found. As you know, serialization saves an object so that it can be reanimated later by deserializing it back to full objectness. But remember that transient variables are not saved when an object is serialized. A bad scenario might look like this:

Here’s what could happen using code like the preceding example:
• Give an object some state (assign values to its instance variables).
• Put the object in a HashMap, using the object as a key.
• Save the object to a file using serialization without altering any of its state.
• Retrieve the object from the file through deserialization.
• Use the deserialized (brought back to life on the heap) object to get the object out of the HashMap.

Whoops. The object in the collection and the supposedly same object brought back to life are no longer identical. The object’s transient variable will come back with a default value rather than the value the variable had at the time it was saved (or put into the HashMap). So using the preceding BadHashExample code, if the value of x is 9 when the instance is put in the HashMap, then since x is used in the calculation of the hashCode, when the value of x changes, the hashCode changes too. And when that same instance of BadHashExample is brought back from deserialization, x == 0, regardless of the value of x at the time the object was serialized. So the new hashCode calculation will give a different hashCode, and the equals() method fails as well since x is used to determine object equality.

Remember: transient variables can really mess with your equals() and hashCode() implementations. Keep variables non-transient or, if they must be marked transient, don’t use them to determine hashcodes or equality.

5 comments:

Nice article but I believe its important to understand the consequences of not following this contract as well and for that its important to understand application of hashcode in collection classes e.g. How HashMap works in Java and how hashcode() of key is used to insert and retrieve object from hashMap.

toString is mostly used when you try to print a class's object using System.out.println and pass the object as a parameter to this function. I think if one makes a habit to always override toString method then only all classes in application will allow printing important information about object using SOP How to implement toString method in Java