Clarity on overriding the equals method

For a few days I've been reading about the importance of overriding the equals method. I keep scratching my head about how overriding it actually determines or checks the values stored in the variable. I realize that you can check the values stored in the primitive datatypes with "==", and when you don't override the equals method it acts the same way, right? When used with a reference datatype, "==" or the default equals() method only compares, or sees, if the variable is pointing to the same instance of a class. For some reason, in the examples that I'm seeing in books or online, I don't understand what is taking place to actually check the values stored inside the variables.

Here is part of an example (I've added comments for things that are confusing me):

So we use Object here instead of the class type we're overriding this equals method for? Is this so that we can use it to check different types? (overloading?)

If you don't use Object as the parameter type, you're overloading, not overriding.

Ahh..yes that's right, sorry.

Knute Snortum wrote:First check whether the parameter is the same class, then cast the object to the class, then test each of its fields. I can be more specific if necessary.

So would it be something like...

---

Isn't "==" only used to compare and see if two things are referencing the same object (when not dealing with primatives)? Wouldn't this return a false? It seems like I'm misunderstanding the of usage of "==" in this method.

3: When you are testing whether you are comparing an object with itself in the equals method.

You can rewrite that method to use return once only like this:-That will only work if x is a primitive (or an enum element); otherwise you have to use equals(). Note the use of || and && not | or &. Also note that the terms in that boolean conjunction must follow that order, otherwise there is a risk of Exceptions and the equals method is not allowed to throw Exceptions. You need to find what to do if any of the fields is a reference type and might be null. Many people wrote utility classes with methods to encapsulate the null tests, and there is a Java7 method which does the same (use this or this instead for arrays).

This is the simple way to write an equals method. You have to override the hashCode method too. You do realise that the equals method is one of the most complicated things in Java programming? There are three well‑known references about it; they are listed in this post.
Using getClass means that subtype instances are implicitly not equal; using instanceof means the equals method fails when you add fields in the subtypes. Probably the best way to do it is use instanceof and then you have the design principle that you are not allowed to add fields in subtypes.

Campbell Ritchie wrote:You do realise that the equals method is one of the most complicated things in Java programming?

It is starting to seem so. It's definitely the only thing I've been able to think about as far as Java goes these last couple days. I'm definitely going to read the resources you've posted (thank you for all of your help btw), just only have a few minutes at work right now. It is driving me crazy though. I don't understand which part of overriding the equals method, makes it compare two values stored in a non-primative datatype as opposed to returning the same thing "==" does if we do not override it.

I understand the checks, to make sure it's not null, is part of the same class, etc..but hmm...

I know it's going to be one of those things, that once I get it, its going to be like "Ahh!!! I'm such an idiot!" Some serious mental blockage going on!

Danny Treart wrote:Isn't "==" only used to compare and see if two things are referencing the same object (when not dealing with primatives)?

Yup. And if they're '==', then they MUST be 'equal()' (more below). It's simply a quick check to eliminate ONE possibility.

So: what if they're "!="? - ie, they're different objects:

Suppose I have two Strings:
private String a1 = new String("a"); and
private String a2 = new String("a");

Are the the same object? No, because I created two new ones when I set them up.

Are they "equal"? Ah now, as Shakespeare would say: there's the rub. What does "equal" mean?

Let me turn it around: Can you think of any reason (other than identity) why a1 and a2shouldn't be "equal"?

As it turns out, the designers of the String class couldn't, which means any two Strings that contain exactly the same characters in the same sequence ARE equal().

So equals() implements a "logical" notion of whether two objects are equal or not.

Unfortunately, it doesn't end there.

The designers of Java (who were probably very good mathematicians) understood that an '==' (or equals()) function must have certain rules; namely:

1. It must be reflexive: ie, x.equals(x) MUST return true (and THAT'S what the '==' check is all about).
2. It must be symmetric: ie, x.equals(y) MUST be the same as y.equals(x).
3. It must be transitive: if x.equals(y) and y.equals(z), then x.equals(z) MUST be true.

and those last two rules are VERY difficult (in fact, impossible) to guarantee for a function called by only one of the objects involved.

Enter the "class-based" equals() method, which is very similar to what you wrote above and, in my not-so-humble opinion (in this case), the worst calumny ever created in the "learning Java" world.

While it might not seem so terrible to you, it is a detestable, procedural "kludge" of an implementation for an object-oriented language.

It does, however, have two shining merits:

It's simple.

and

It's bulletproof.

and who doesn't want that, right?

But the problem is that, as Campbell already said, equals() is NOT simple. And implementing a method like that can end up causing problems that are far more complicated (and possibly difficult to detect) than the one it was designed to solve - the most obvious one being that an anonymous class created from a class with an equals() method written like that will NOT be "equal" to its parent.

And that's where Knute's suggestion comes in: instanceof is a much more "object-oriented" way of type-checking; and it also has the advantage of not having to check for null. It just requires a bit more thought.

I you're interested, TheRoadToEquality provides (possibly) an even better way; and even if you decide it's not for you, it may help to explain exactly what all this "difficult" stuff is about.

HIH

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here

Wow. Thank you for all of the input guys. AWESOME explanations and advice. It's really helping me out a ton. I'm going to continue researching this, and hopefully as I continue learning about Java I'll become a little more comfortable with it.

To add a little flavour to this thread I will add to it something that I'm currently discovering-> If you take the time and train your mind to comprehend elements on the bit level, programming becomes simpler...

No matter what programming language you're using or what programming paradigm you favour the most, it all breaks down to the simple states of on (1) and off(0)

01100001 will always be equal to 01100001

All elements that we model on a computer system is just an abstraction to the underlying bit patterns

As you said, if you test whether an object is equals to itself, it must be true. Every object is obviously equal to itself. So you use the == which can test reference types to see whether the two operands point to the same object. If they both point to the same object, then you know it is going to be equal and you know there is no point in carrying out the other checks. So you short‑circuit the remainder of the method by using || (never |) or return.

As I said, use instanceof and the design principle that you don't add fields in subclasses. That means you can maintain symmetry. Yes, this means programming is difficult. I hope nobody ever told you it was easy.

The documentation for equals which Winston quoted looks really difficult to understand, but it isn't difficult at all. It tells you six things in the six bullet points (I know you thought there were only five, but there is one hidden).

4: Consistency: you always get the same result if no data have changed.

4½: This is the one they have hidden: No exceptions. It says in no 4 that it consistently returns true or false. The only way to get out or returning is to throw an Exception, so always returns something means you can never throw Exceptions.

5: nulls: An object which exists is implicitly different from an object which doesn't exist And you can't have Exceptions if you pass null as an argument because it says returns false.

See it is really easy to understand. Just bl**d* difficult to implement!

Rico Felix wrote:All elements that we model on a computer system is just an abstraction to the underlying bit patterns

Yeah, sometimes very abstract.

@Danny: I don't want to overload you with too much stuff too soon, but if you're interested, the main problem with equals() - at least in Java - is that it violates something called the Liskov Substitution Principle - not on its own, but when it gets overridden more than once.

<hijack>
I'm told that this has a lot to do with Java being statically-typed, and that the problem doesn't exist in more "dynamic" languages, but I have to admit to being sceptical. It seems to me to be a case of two things ("equals" and LSP) that just don't play well together; although I freely admit that I'm not sure.

If anyone could point me to a good document or page that explains why it isn't such a problem for dynamic languages I'd be most grateful.
</hijack>

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here

Hmm... ask yourself: is it only a theoretical problem, or is it in practise a big problem?
If it is not a big practical problem, then who cares about this Liskov thing? Do we face
the danger of some nuclear plant going out of control?

And to answer your hijack question: I really have no idea about dynamically typed
languages. I have a rather simple look on this issue. A may be equal to B, from A's
point of view, but they may be unequal from B's point of view. And since you can't
guarantee who's point of view is going to be used, the only option left is simply
know what you're doing. And with this simple philosophy, I have encountered little
problem so far.

Admittedly: I'm not a professional programmer, so I would like to hear about big
practical problems.

The commonest problem is that you put Object obj into a Collection and cannot get it back. That might not bring down planes out of the air or cause reactors to melt their way down into the soil, but it might cause failure of your website to sell the pink skirt which appeared a minute ago. If that sort of thing becomes well‑known it might cost your employer millions in lost sales. It could result in thousands of complaints about wrong amounts in bank accounts or wrong amounts of tax taken.
And the only reason why you dob't have aeroplanes falling from the sky is the Java® isn't usually used in flight control systems.

Or actually be used as an exploit, since the effects are even less widely known than the cause; and likely more difficult to detect and/or duplicate.

@Piet: And anyway, I'm surprised at you, a mathematician, claiming that a theoretical problem might not actually be real. You must have heard about the one embedded in the binary search algorithm that took six years[*] to discover.
Hint: mid = (max + min) >>> 1 is correct.

And remember, equals() is used everywhere; so any problem, no matter how "theoretical", is likely to become a problem. That's just basic Sod's Law.

Winston

[*]Actually, twenty, according to Wikipedia; although I believe that's based on a specific paper.

"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here