domingo, marzo 18, 2012

After 8 years developing server and embedded applications using Hibernate as ORM, squeezing my brain seeking solutions to improve Hibernate performance, reading blogs and attending conferences, I decided to share this knowledge acquired during these years with you.

This is the first post of many more posts to come:

Last year I went to Devoxx as speaker but also I attended Patrycja Wegrzynowicz conference about Hibernate Anti-Patterns. In that presentation Patrycja shows us an anti-pattern that shocks me because it proved to expect the unexpected.

We are going to see the effect it has when Hibernate detects a dirty collection and should re-create it.

Let's start with the model we are going to use, only two classes related with one-to-many association:

In previous classes, we should pay attention in three important points:

To test model configuration, we are going to
create a test which creates and persists one Starship and seven Officers, and in different Transaction and EntityManager finds created Starship.

Now that we have created this test, we can run it and we are going to observe Hibernate console output.

See the number of queries executed during first commit (persisting objects) and during commit of second transaction (finding a Starship). In total and ignoring sequence generator, we can count 22 inserts, 2 selects and 1 delete, not bad when we are only creating 8 objects and 1 find by primary key.

At this point let's examine why these SQL queries are executed:

First eight inserts are unavoidable; they are required by inserting data into database.

Next seven inserts are required because we have annotated getOfficers property without mappedBy attribute. If we look closely at Hibernate documentation, it points us that “Without describing any physical mapping, a unidirectional one to many with join table is used.”

Next group of queries are even stranger, the first select statement is to find Starship by id, but what are these deletes and inserts of data that we have already created?

During commit Hibernate validates whether collection properties are dirty by comparing object references. When a collection is marked as dirty, Hibernate needs to re-create whole collection, even containing the same objects. In our case when we are getting officers we are returning a different collection instance, concretely an unmodifiable list, so Hibernate considers officers collection as dirty.

Because a join table is used, Starship_Officer table should be re-created, deleting previous inserted tuples and inserting the new ones (although they have the same values).

Let's try to fix this problem. We start by mapping a bidirectional one-to-many association, with many-to-one side as owning side.

And now we rerun the same test again and we inspect the output again.

Although we have reduced the number of SQL statements, from 25 to 10, we still have an unnecessary query, the ones just in commit section of second transaction. Why if officers are lazy by default (JPA specification), and we are not getting officers in transaction, Hibernate executes a select on Officers table? By the same reason as previously configuration, returned collection has different Java identifier, so Hibernate marks it as newly instantiated collection, but now obviously join table operations are no longer required. We have reduced the number of queries but we still have a performance problem. It is likely that we'll need some other solution, and the solution is not the most obvious one, we are not going to return collection objects returned by Hibernate, we might expand on this later, but we are going to change annotations location.

What we are going to do is to change mapping location from property approach to use field mapping. Simply we are going to move all annotations to class attributes rather than on getters.

And finally we are going to run the test again, and see what's happen:

Why using property mapping Hibernate runs queries during commit and using field mapping are not executed? When a Transaction is committed, Hibernate execute a flush to synchronize the underlying persistent store with persistable state held in memory. When property mapping is used, Hibernate calls getter/setter methods to synchronize data, and in case of getOfficers method, it returns a dirty collection (because of unmodifiableList call). On the other side when we are using field mapping, Hibernate gets directly the field, so collection is not considered dirty and no re-creation is required.

But we have not finished yet, I suppose you are wondering why we have not removed Collections.unmodifiableList from getter, returning Hibernate collection? Yes I agree with you that we finished quickly, and change would look like @OneToMany(cascade={CascadeType.ALL}) public List<Officer> getOfficers() {officers;} but returning original collection ends up with an encapsulation problem, in fact we are broken encapsulation!. We could add to mutable list anything we like; we could apply uncontrolled changes to the internal state of an object.

Using an unmodifiableList is an approach to use to avoid breaking encapsulation, but of course we could have used different accessors for public access and hibernate access, and not calling Collections.unmodifiableList method.

Considering what we have seen today, I suggest you to use always field annotations instead of property mapping, we are going to save from a plenty of surprises.