I was recently doing some performance tuning and made the surprising discovery that doing less caching in Hibernate actually improved performance in a particular scenario. When I discovered the problem this seemed very counter-intuitive. In fact, my original design maximized the use of caching in order to improve performance, but the opposite happened in practice. In hindsight, naturally, the reason for this was fairly obvious. So I thought I would share the details of this situation so help you avoid making the same mistake.

I was tuning a batch processing application that received XML input data sets, each consisting of thousands of separate input records. The processing logic converted each input record into multiple Hibernate entities – as many as several hundred. This logic required a number of queries to implement - some to load related, preexisting entities, and others to verify consistency with existing data. This queried data would often be needed for multiple input records in the same data set. Based on this, I decided to use a single Hibernate session to process the entire data set, committing after each input record but keeping the session open to be able to make use of cached entities for subsequent processing.

When initial performance tests were carried out, they showed a disturbing trend: the processing time required per input record in the data set increased linearly. This meant that the total time required to process a data set increased exponentially with the size of the set! This is illustrated by the diagrams below.

An analysis of where the time was being spent showed that the majority of the processing logic required only constant time per record. Where was the extra time going? The culprit seemed to be the call to commit the transaction to the database. I knew that even a few hundred database insert/update statements would execute quickly in nearly constant time (databases are built to scale, after all). The actual database commit was equally speedy. Normally by default I assume that network calls will be the source of performance delays. But in this case this assumption appeared to be incorrect.

So what exactly was happening when I committed the Hibernate transaction, before the calls to the database? Hibernate's first step is to perform a flush to write all entities with changes (called dirty entities) to the database via insert/update/delete calls. How exactly does Hibernate determine which entities are dirty? For loaded entities Hibernate uses byte-code instrumentation to add logic to track when entities become dirty. But my scenario involved new entities, for which Hibernate could not work its magic. So on each flush Hibernate scanned the fields of each entity to see if there were changes. A linearly-increasing number of entities naturally led to a linearly-increasing time per flush. To make matters worse, Hibernate's flush algorithm apparently has a performance problem when dealing with cascaded collections, which I was using in my scenario.

The solution to my performance problem was to evict all the entities from the session after committing, thus making all entities detached, and then reattaching to the session the few entities I reused in subsequent processing.

4 Comments
on “Avoiding Caching To Improve Hibernate Performance”

Correct me if I am wrong, but in the scenario you are mentioning (batch process) you should be using Stateless Hibernate sessions. If not is normal to get those performance figures. So I think in this case is not a problem with the cache but a misuse of Hibernate.

Using stateless Hibernate sessions is an option, but since such sessions have no level one cache at all, you lose out on a lot of Hibernate functionality that I consider essential (see https://www.hibernate.org/hib_docs/v3/api/org/hibernate/StatelessSession.html for details). So I consider stateless sessions to be an option of last resort. You are correct that filling the level one cache with thousands of objects hurting performance is poor usage of Hibernate.

Hi.
I still think that in a batch-like scenario the first level cache is for no use. Anyway you always have the choice of using whatever Hibernate session that is more suitable for your needs, and you can use the normal Hibernate Session for the rest of the application. But I would never use the normal Hibernate session for a batch process like the one you describe, or you will end up having severe performance issues like those you have mentioned in your post. I still think the best solution to that problem is as easy as using the stateless session, with no drawbacks at all.
Cheers.