Comparing Values for Equality in .NET: Identity and Equivalence

An article clarifying the various ways of comparing two values for equality in .NET

Introduction

The various ways of comparing two values for equality in .NET can be very confusing. In fact if we have two objects a and b in C# there are at least four ways to compare their identity, plus one operator that looks like an identity comparison to add to the confusion:

if (a.Equals(b)) {}

if (object.Equals(a, b)) {}

if (object.ReferenceEquals(a, b) {}

if (a == b) {}

if (a is b) {}

As if that isn't confusing enough, these methods and operators behave differently depending on:

whether a and b are reference types or value types

whether they are reference types which are made to behave like value types for these purposes (System.String is one of these)

This article is an attempt to clarify why we have all these versions of equality, and what they all mean.

What does it mean to be the same?

Firstly, we have to understand that there are actually two basic types of equality for objects:

Identity (reference equality): Two objects are identical if they actually are the same object in memory. That is, references to them point to the same memory address.

Equivalence (value equality): Two objects are equivalent if the value or values they contain are the same.

So if we have two integers, a and b, both set to value 3, they are equivalent (they have the same value) but not necessarily identical (a and b can refer to different memory addresses).

However if two objects are identical (the same object) then they must be equivalent (have the same underlying values).

What type of Equality do we expect?

Clearly these notions of identity and equivalence are related to the concept of reference types and value types.

Value types are intended as lightweight objects that have value semantics: two objects are the same if they have the same value, and then can be used interchangeably. So integers a and b are the same in the example above because their values are both 3, it doesn't matter if references a and b actually refer to the same underlying object in memory.

We don't in general expect reference types to behave this way. Suppose we have two separate objects of type Book (a class). Book has one member variable called 'title' (a string). Do we necessarily consider these the 'same' Book if they have the same title? We might do so, but it isn't clear.

To clarify the situation we might add an additional field 'BookId' which is unique for a given actual book. We could then say that two books are the same if they have the same BookId, even if they have different titles. But then we wouldn't normally expect to have two separate Books with the same BookId in memory at the same time: there's only one underlying book. So potentially we can just compare memory addresses to see if two Books are the same.

The point is that equality for reference types is trickier to define. Our default definition is going to be that two reference types are the same if they are identical.

Types of Equality

Now I'll go through each of the types of equality referred to in the first paragraph in turn and try to explain why they exist. I'll also explain how they are implemented for value and reference types, and when you should override or overload them.

a.Equals(b)

Overview

Equals() is a virtual method on System.Object. This means every single object can call this, and in your own type definitions you can override it to give the behaviour you want.

The base System.Object implementation of Equals() is to do an identity comparison. However, Equals() is intended to test for identity or equivalence as appropriate (see the discussion in the paragraph above).

Value Types

For value types this method is overridden to do a value (equivalence) comparison. In particular, System.ValueType itself, the root of all value types, contains an override that will compare two objects by reflecting over their internal fields to see if they are all equal. If you inherit this (by setting up a struct) your struct will get this override by default.

Reference Types

For reference types, as discussed above, the situation is trickier. In general we expect Equals() for reference types to do an identity comparison (to check whether the objects actually are the same in memory).

However, certain reference types aren't lightweight enough to work as value types, but nevertheless have value semantics. The canonical example is System.String. System.String is a reference type. However if we have a = "abc" and b = "abc" we expect a to be equal to b. So in the framework Equals() is overridden to do a value comparison.

Override or not?

As mentioned above, for value types there is a default override of a.Equals(b) in the base class System.ValueType which will work for any structs you set up. This method uses reflection to iterate over all of the fields of the two value types you are trying to compare, checking that their values are equal. In general this is what you want for value type comparison.

However, the overridden Equals() method uses reflection, which is slow, and involves a certain amount of boxing. For speed optimization it can be good to override this method. For a more detailed discussion of this see Jeffrey Richter's book 'Applied Microsoft .NET Framework Programming'.

In general it is considered good practice to leave Equals() doing its default identity comparison when defining new reference types (classes). The exception is when you know you want value semantics for your class (like System.String), or when you want Equals to work in a specific way. In particular, if your class is going to be used as a key in a Hashtable you need to override Equals if that is to be in any way efficient.

Note that if you override a.Equals(b) you should also override GetHashCode() and should consider overriding IComparable.CompareTo().

object.Equals(a, b)

Overview

object.Equals(a, b) is a static method on the object class. Jeffery Richter describes it as 'a little helper method'. It's easiest to think of it as a method that does some checking for nulls and then calls a.Equals(b).

The reason it exists is that if a is null a call to a.Equals(b) will throw a NullReferenceException. If there's a possibility that a will be null it is easier to call object.Equals(a, b) than explicitly check for the null. If a can't be null there's no need for the additional check and a call to a.Equals(b) will be better.

Detail

In detail, this method does the following for a call to object.Equals(a, b):

Check if a and b are identical (i.e. they refer to the same location in memory or are both null). If so return true.

Check if either of a and b is null. We know they are not both null otherwise the routine would have returned in 1) above, so if either is null return false.

Both a and b are not null: return the value of a.Equals(b).

Value Types and Reference Types

Since a and b can't be null for value types, object.Equals(a, b) is identical to a.Equals(b). In general you should call a.Equals(b) in preference to object.Equals(a, b) for value types.

For reference types, as discussed above, you should call this method if there's a chance that a will be null in a call to a.Equals(b).

Override or not?

object.Equals(a, b) is a static method on System.Object, and consequently can't be overridden. However, since it calls into a.Equals(b) any overrides of Equals will affect calls to this method as well.

object.ReferenceEquals(a, b)

Overview

Whilst the two incarnations of Equals() above check for identity or equivalence depending on the underlying type, ReferenceEquals is intended to always check for identity.

Value Types and Reference Types

For reference typesobject.ReferenceEquals(a, b) returns true if and only if a and b have the same underlying memory address.

In general we shouldn't care whether value types occupy the same underlying memory address. It isn't relevant for anything we'd want to normally use them for. But the definition above gives us a problem when we come to value types being compared with ReferenceEquals.

The difficulty comes from the fact that ReferenceEquals expects two System.Objects as parameters. This means that our value types will get boxed onto the heap as they are passed in to this routine. Normally, because of the way the boxing process works, they will get boxed separately to different memory addresses on the heap. This of course means the call to ReferenceEquals returns false.

So for example object.ReferenceEquals(10, 10) returns false, for these reasons.

You can see it's the boxing that causes the problem in the following code:

// Set up value type in int variable - no boxing
int value = 10;
object one = value; // Cast to object so boxed
object two = value; // Cast again so boxed again separately
// one and two are now separate memory locations on the heap
Console.WriteLine(object.ReferenceEquals(one, two)); // false
// Set up value type in object variable which
// immediately boxes it onto the heap
object value2 = 10; // value is boxed already
object three = value2; // three points to the boxed value
object four = value2; // four also points to the same boxed value
Console.WriteLine(object.ReferenceEquals(three, four)); // true

Override or not?

ReferenceEquals is a static method on object, and so once again cannot be overridden. It will always perform identity checks as outlined above.

a == b

Overview

== is an operator, clearly, and not a method. In my humble opinion it has been included in C# largely as a syntactic convenience and to make the language look like C/C++.

As with a.Equals(b), == is intended to test for identity or equivalence as appropriate (see the discussion in the paragraph "What type of Equality do we expect?" above. In fact, in almost all circumstances == should behave like a.Equals(b).

Value Types

For value types within the .NET Framework, == is implemented as you would expect, and will test for equivalence (value equality). However, for any custom value types you implement (structs) a default == will not be available unless you provide one.

Reference Types

For reference types a default == is available, and this will test for identity (reference equality). For most reference types in the .NET Framework == will again test for identity, but, as for a.Equals(b), there are certain classes where the operator has been overloaded to do a value comparison. System.String is once again the canonical example, for the reasons discussed in part 1 of this article.

Override (overload?) or not?

Since == is an operator we can't override it. However, we can overload it to provide a different functionality to the base functionality described above.

For reference types Microsoft recommends that you don't overload == unless you have reference types behaving as value types as discussed above. This means that even if you override a.Equals(b) to provide some custom functionality you should leave your == operator to provide an identity test. This is, I think, the only occasion where == should behave differently from a.Equals(b).

For value types, as mentioned above, a default overload of == will not be available and you will have to provide one if you need one. The easiest thing to do is simply to call a.Equals(b) from an operator overload in your struct: in general your implementation of == should not be different from a.Equals(b).

Note that if you overload == you should overload !=. You should also override a.Equals(b) to do the same thing, and as a result should overload GetHashCode. Finally you should consider overriding IComparable.CompareTo().

Care with == and Reference Types

One final thing to note is that operator overloads don't behave like overrides. If you use the == operator with reference types without thinking, this can be a problem.

For example, suppose you have an untyped DataSet ds containing a DataTable dt. Suppose this has columns Id and Name. dt has two rows. Consider the following code:

When we compare with == in the example above we get false, even though the column in both rows contains the integer 1. The reason is that both row1[Value] and row2[Value] return objects, not integers. So == will use the == in System.Object, not any overloaded version in integer. The == in System.Object does an identity comparison (reference equality test). The underlying values have been separately boxed onto the heap, so aren't in the same memory address, and the test fails.

When we compare with .Equals we get true. This is because .Equals is overridden in System.Int32 to do a value comparison, so the comparison uses the overridden version to correctly compare the values of the two integers.

a is b

Overview

a is b isn't actually a test for object equality at all, although it looks like one. b here has to be a type name (so b would need to be a class name, for example). The operator tests whether object a is either of type b or can be cast to it without an exception being thrown. This is equivalent to TypeOf a Is b in VB.NET, which is a little clearer.

Value Types/Reference Types

The operator works in the same way for both value types and reference types.

.Equals is overridden in the string class to do an equivalence (value) comparison, and the values are equal. So a.Equals(b) is true (you would still be right).

However, a == b is an overload and on the object type it does an identity comparison, not a value comparison (you would still be right).

a and b are separate objects in memory so a == b is false (you would be wrong)

4. is actually wrong, but only because of an optimization in the CLR. The CLR
keeps a list of all strings currently being used in an application in
something called the intern pool. When a new string is set up in code
the CLR checks the intern pool to see if the string is already in use.
If so, it will not allocate memory to the string again, but will re-use
the existing memory. Hence a == b is true above.

You can prevent strings being interned by using a StringBuilder as below. In this case a.Equals(b) will be true, and a== b will be false, which is what you would expect:

VB.NET

This article has talked mainly about C#. However, the situation is similarly confusing in VB.NET. Because they are methods on System.Object, VB.NET has methods a.Equals(b), object.Equals(a, b) and object.ReferenceEquals(a, b) which are the same as the methods described above.

VB.NET has no == operator, or any operator equivalent to it.

VB.NET additionally has the Is operator. This operator's use in TypeOf a Is b statements was discussed under a is b: Overview above.

VB.NET: a Is b

The Is operator can also be used for identity (reference equality) comparisons on two reference types in VB.NET. However, unlike a.ReferenceEquals(b), which does the same thing for reference types, the Is operator cannot be used at all with value types. The Visual Basic compiler will not compile code where either of a or b in the statement a Is b are value types.

Especially in a day where lots of people are never learning languages that use explicit pointers, the implicit ones used in "references" can be confusing, and articles like this are therefore pretty helpful.

Only thing I'd change is a slight clarification on string interning. The CLR doesn't keep track of and share all strings in use by the application, it does so for all string literals in use by the application. Without that clarification, your StringBuilder example behaving as it does is confusing, because you'd think ToString(), by returning a string, would return something that would fall into string interning just as much as any other method of generating a string.

VB .NET equivalent for == is =. It compares by value and cannot be applied (throws an exception) for reference type objects unless such objects implement overload for the operator. Though I'm not sure what's the relationship between Equals and = in VB.

I applaud your effort to clarify the confusion about comparing objects for equality. But you completely miss the point in your discussion.

First, equality should always mean what you call equivalence. If you have 2 objects of a class Book and they have the same title (and there are no other fields) then they should always compare equal. If you wish to distinguish between different books with the same title then (as you suggested) you need another field (eg, ISBN).

The crux of the confusion is something you did not mention. What MS really says is that unless a reference type is immutable tests for equality should compare the references and not use instance fields. As far as I can make out the only reason for this is so that a reference type can safely be used as the key for hash tables. If a mutable reference type is used as the key in a hash table and an instance of a key object is modified via a different reference to the same object then the value of the object can change unexpectedly. If this "value" is used by the hash table when testing for equality then the behavior of the container will become unpredictable.

I suspect that this is also the main reason that strings are immutable.

Another thing is that in your discussion on string interning you says that a == b will be true. However, the CLR does not guarantee that strings are interned so it could return true or false.

Also discussion of the "is" keyword is way off-topic, perhaps mentioned parenthetically at most.

It's some time after this was posted, but I want to make a comment on it, as it made me think a bit about what I'd written and whether I actually had missed the point.

Andrew says that 'if you have 2 objects of a class Book and they have the same title (and there are no other fields) then they should always compare equal'. I'm not quite sure what he means, but by default they do not 'compare equal' of course, either with Equals or ==. See the code below.

These classes are what I'm calling 'equivalent' (have the same fields) and the default implementation of Equals tests for what I am calling 'identical' (are the same instance in memory).

So I think he means we should always override Equals for reference types so that they are equal if they have the same fields (are equivalent).

However, the obvious question is then why Microsoft didn't make that the default implementation. Andrew goes on to suggest this is because the Book objects could potentially be used as keys in hash tables, where this would be a problem unless they were immutable, and I'm sure he's right that this is one of the reasons.

Of course this is also a reason why we shouldn't always override Equals in this way.

However, most classes aren't used as keys in hash tables. Most classes also don't have Equals overridden at all and they work fine: I think I've only overridden Equals about two or three times in my entire career.

The reason for this is that if you are loading multiple objects of the same class normally you will make sure that there's only one copy of a specific object in memory. Reference types don't usually get copied, unlike value types. If there's only one copy of an object in memory it will stay that way. So it can be perfectly valid to consider reference types to be equal if they are identical (the same instance in memory). Which means it can be OK to go with the default Equals provided by the framework.

Of course you may not want your reference types to behave that way, in which case you do need to override Equals to change its behaviour. However then you need to be careful that no-one uses your objects as keys in hash tables.

This is pretty much what I say in my article: 'The point is that equality for reference types is trickier to define. Our default definition is going to be that two reference types are the same if they are identical.'

Interestingly the MSDN documentation for System.ValueType clearly states that "The default implementation of the Equals method uses reflection to compare the corresponding fields of obj and this instance. Override the Equals method for a particular type to improve the performance of the method and more closely represent the concept of equality for the type."

But if you look at ValueType using Reflector you see that's not always true:

if (CanCompareBits(this))
{
return FastEqualsCheck(a, obj);
}

It looks like the CLR can do a faster bitwise check in some situations, although I can't see which as it's implemented in native code.

This is never going to be faster than implementing Equals properly yourself (and preferably implementing IEquatable). You may find on very large structs doing the memcmp rather than individual fields is faster but you'd need some heavy benchmarking (read: no microbenches) to be sure.

GetType() *is* reflection. It is one of the faster ones but:

from A good article on msdn about high performance managed codeThe reflection APIs can be grouped into three performance buckets; type comparison, member enumeration and member invocation. Each of these buckets gets progressively more expensive. Type comparison operations—in this case, typeof in C#, GetType, is, IsInstanceOfType and so on—are the cheapest of the reflection APIs, though they are by no means cheap.