Don't Let Hibernate Steal Your Identity

Enterprise Java applications often move data back and forth between Java objects and relational databases. There are several ways to do this, ranging from manually coded SQL to sophisticated object-relational mapping (ORM) solutions such as Hibernate. Regardless of what technique you use, once you start persisting Java objects to a database object identity becomes a complex and difficult-to-manage topic. The possibility arises that you will instantiate two different objects that represent the same row in the database. To handle this, you must properly implement the equals() and hashCode() methods on your persistent objects, but a proper implementation of these methods may be trickier than it at first appears. To make matters worse, the conventional wisdom (as espoused in the official Hibernate documentation) may not lead you to the most practical solution for new projects.

The problem stems from differences between object identity in the virtual machine (VM) and object identity in the database. In the VM you do not get an ID for an object; you simply hold direct references to the object. Behind the scenes, the VM does assign an eight-byte ID, which is what a reference to an object really is. The problems start when you persist an object in a database. Say you create a Person object and save it to a database (person1). Somewhere else in your code you read in the Person data and instantiate a new Person object (person2). You now have two objects in memory that are mapped to the same row in the database. An object reference can only point to one or the other, but we need a way to show that these are really the same entity. This is where object identity comes in.

In Java, object identity is defined by the equals() method (and the related hashCode() method) that is present on every object. The equals() method should determine whether two objects represent the same entity, regardless of whether they are the same instance. The hashCode() method is related because all objects that are equal should also return an identical hashCode. By default, equals() compares object references. An object is equal to itself, and not equal to any other instance. For persistent objects, it is important to override these methods so objects that represent the same row in the database are always considered equal. This is particularly important for Java Collections (Sets, Maps, and Lists) to work correctly.

To illustrate the different ways to implement equals() and hashCode(), let's consider a simple Person object that we want to persist to the database.

In this example, we've followed best practices by having both an id field and a version field. The id holds the value used as the primary key in the database, and the version starts at 0 and is incremented each time the object is updated (this helps us avoid concurrent update problems). For clarity, let's also look at the Hibernate mapping file that allows Hibernate to persist this object to a database:

The Hibernate mapping file indicates that the id field on Person is the database ID (i.e. it is the primary key in the PERSON table). Within the id tag is an attribute, unsaved-value="null", that tells Hibernate to use the id field to determine whether a Person object has been previously saved or not. ORM frameworks must make this distinction to know whether they should save the object with a SQL INSERT or UPDATE statement. In this case, Hibernate assumes that the id field starts out null on new objects and is assigned when they are first saved. There is also a generator tag that tells Hibernate where to get an id to assign to the object the first time it is saved. In this case, Hibernate is using a database sequence as a source of unique IDs. Finally, the version tag tells Hibernate to the use the Person object's version field for concurrency control. Hibernate will enforce an optimistic locking scheme, whereby it checks the object's version number against the database's version number before saving changes to the object.

What's missing from our Person object is an implementation of equals() and hashCode(). Since this is a persistent object, we don't want to rely on the default implementation, which can't distinguish between two different instances that represent the same row in the database. A simple and obvious way to implement these methods is to use the id field for the equals() comparison and to generate the hashCode().

Unfortunately, there is a problem with this implementation. When we first create a Person object the id is null, which means any two Person objects are considered equal if they haven't been saved yet. If we were to create a Person and put it in a Set, then create a completely different Person and put it in the same Set, the second Person could not be added. That's because the Set would conclude that all unsaved objects are the same.