Sizing Up Open Source Java Persistence : Page 6

Confused and puzzled by the plethora of persistence options in Java? You are not alone. Examine how some popular open source persistence frameworks stack up against one another and JDBC.

by Jim White

Feb 15, 2007

Page 6 of 6

Object State
While hidden under the covers of the API, persistence frameworks must still open a connection to the database and start a transaction in order to perform work with the database. However, unless your application and/or database is being used by only a few users and contention for data is limited, the application must connect, transact, and then disconnect quickly to avoid nasty data contention issues (dirty reads, long held row locks, etc.). So, how do you allow an application object to go on living while its direct connection to the database is discontinued? When the application reconnects and a new transaction is started, how do the object and the database get realigned when updates have been made to one or both? In other words, how are persistence objects managed across multiple database transactions?

Synchronizing object state with database state is a constant challengeespecially in applications where persistent objects are passed between application tiers. You load objects in one transaction, allow the users to edit the object and then save the changes in a new transaction.

Most of the frameworks call objects that are persistent in the database, but not directly reflective of the database via live transaction "detached." When the application reconnects, frameworks like Hibernate, Castor, and JDO allow the object's state to be automatically reattached and synchronized.

iBatis does not offer object state management across transactions. The concept of a detach object does not exist. As a developer, you must manage an object's relationship to the database just as you have to do when using JDBC. Again, this requires a lot more code and detailed design about what to do when the underlying data has changed and when to synchronize the object's state to the database.

Castor offers the concept of detached objects as well, but to "reattach" the object requires what is called a "long transaction." In order to have long transactions, the domain objects must implement Castor's org.exolab.castor.jdo.TimeStampable interface. This interface requires the object and resulting database table to essentially carry an effectivity timestamp; thus impacting the domain and database models! Ouch!

Proxy or Null for Lazy Data
Finally, returning to lazy loading, suppose you request an Employee object, but do not eagerly fetch the addresses associated to the Employee. Your application closes the transaction, leaving the Employee object without addresses. If the application attempts to then access the addresses, what does it get? Tough question?

Some frameworks, like Hibernate or Castor, will substitute a proxy object in place of any object that is not yet loaded and subject to lazy loading. For example, in Hibernate, in place of an actual java.util.Set of addresses, Hibernate will place its own proxy object instance (org.hibernate.collection.PersistentSet). In a transaction, if the application were to request the addresses of an employee, Hibernate would quickly load the addresses and replace the proxy object with the real Set.

Castor also uses the proxy strategy. However, in order to use lazy loading you must also change the persistent class that will hold the related objects. Examine Castor's documentation carefully if you want to use lazy loading in Castor.

JDO takes a different approach. It allows lazy loading, but does not load anything into the lazily loaded field like addresses. In fact, if you access the property directly, you will see that the lazy loaded field would contain null. This can cause some issues if you don’t properly access the data with getters and setters:

With an understanding of how each of the frameworks deals with lazily loaded data, let's return to the question about what happens when you try to access lazy loaded data when the connection or transaction to the database is closed? What if the application needs access to an associated object or property like addresses and it was not loaded as a result of lazy loading? Each of the frameworks will return an exception. The exception is different for each. Hibernate will return an org.hibernate.LazyInitializationException and indicate the exact problem. That is that you "failed to lazily initialize a collection of role: com.intertech.domain.Person.addresses - no session or session was closed." JDO throws and javax.jdo.JDODetachedFieldAccessException and indicates "You have just attempted to access field "addresses" yet this field was not detached when you detached the object. Either don't access this field, or detach the field when detaching the object."

Miscellaneous Considerations
There are a number of factors that must also be considered when comparing any of the persistence frameworks. Many of these considerations are not quantifiable and, in some cases, are areas for additional research that cannot be covered here.

Framework by Specification
Only one of the frameworks examined here is backed by a Java Community Process (JCP) managed specification. JPOX JDO is, in fact, the reference implementation of the JDO 2.0 spec. If the support and management of a specification is important to your organization's applications then JDO, EJBs, or Java Persistence API implementation is the solution for you. However, stability by specification may be a false promise. As an example, simply look at the differences between EJB 2.1 and EJB 3.0.

Database Support
Most of the persistence frameworks come ready to work with the popular databases and database drivers. Most, if not all, also provide configuration examples of how to connect the framework to the database. If your database is not one of those commonly supported, or if you have unusual configurations, such as mirrored or load balancing databases, then you may have additional issues that do not lend themselves to these solutions.

Scalability
While the experiments shown in this article worked with 10,000 or more row tables, most would consider this sizeable but certainly not close to the limits supported by most relational databases. If your application needs to read/store millions of objects, you have further research to do.

Cache
Persistence frameworks offer object cache, allowing objects that have been read in from the database to be cached in memory so that subsequent calls to an object with the same identity does not require another trip to the database. This also saves on memory in that a duplicate object should not be created when using the object in cache will do.

Caching capabilities, strategies, and configuration vary greatly among the different persistence framework solutions. Some, like Hibernate and JDO, offer multiple levels of cache. One level of cache is dedicated to objects used by a single instance of the application while other levels of cache work in distributed environments (multiple instances of the application running and sharing objects) usually in JEE containers.

If objects that are used frequently, but change rarely, are a big part of your application, consider examining the framework's caching opportunities.

Queries
JDBC and iBatis rely, obviously, on SQL to retrieve data. Hibernate, JDO, and Castor support different, and sometimes multiple, query mechanisms/languages. Hibernate offers the most options: HQL, Criteria, and native SQL to pull data and retrieve objects from the database. JDO offers JDOQL and native SQL. Castor offers OQL. There are plenty of religious arguments posted on the Web about which of these (HQL, Criteria, JDOSQL, and OQL) is best and which may be more like SQL (assuming you believe SQL is an attractive query mechanism). Suffice it to say that all help reduce the amount of actual SQL code in your application, but also require some additional learning time.

JEE Container Support
There is nothing that prevents any of the persistence frameworks examined from being used in an enterprise situation; specifically in a JEE application on a JEE container. In fact, most persistence frameworks can be configured to use standard transactions or other transactions (through a data source established in the container) offer by a JEE server. Additionally, JEE servers often provide other cache and threading mechanisms to better support scalability and performance. How well the framework takes advantage of the JEE container's transaction manager, cache, thread management, etc. and how these might interfere with the frameworks' own capabilities should also be studied.

Only You Can Decide
When looking at persistence frameworks vs. JDBC, which works best for your application is going to depend. However, the information provided here should help you understand on what it depends.

As a quick recap, let's summarize some of the major decision points and factors when comparing persistence frameworks.

The most widely used and popular persistence framework used today is Hibernate. Along with a large user community and lots of resources to help you get your Hibernate applications up and running, the performance of Hibernate is better than average (and gets better with Hibernate cache), the persistence code is tight and the API is easy to understand. In general, Hibernate leaves you with some nice Java code. Hibernate is feature-rich and very extensible. Because of this, configuring persistence in Hibernate can be adventuresome; especially as the models get complex.

JDO is a viable option to Hibernate, although certainly not as popular as Hibernate today. It, too, is feature-rich and JDO has the backing of a specification. While its user community appears relatively small, documentation, examples, and helpful resources appear ample. Performance is acceptable. JDO persistence code is very tight and easy to read (including the configuration file, in my opinion), but there is the pesky little byte code enhancement that is required before deploying/executing the application. Some may consider this step problematic to security and/or byte code verification issues.

iBatis still requires developers to have extensive SQL knowledge. In fact, it only offers minimalif not negligibleassistance in reducing the amount and complexity of code, when including configuration code, compared to JDBC. iBatis does get SQL code out of the Java files, and offers some convenience in setting SQL statement parameters using JavaBeans and Maps. Object state and association management are still developer-managed issues when using iBatis. It appears that the iBatis user community is small and finding helpful resources may be tougher. However, given its simplicity and lightweight nature, lots of resources may not be necessary and its performance is about as good as JDBC and minimally impacts the overall application footprint.

Castor, in my opinion, is a disappointing option. The performance appears less than viable and I had issues trying to get its association mapping to work properly. I hope these are issues of my making and not problems of the framework. Castor's take on dependent and independent relationships is a bit different than in other frameworks. The need to use long transactions and set up effectivity timestamps to manage detached objects seems intrusive. The user community is again small and the documentation, help, and resources are limited. The size of resulting persistence code and complexity of the API is attractive but probably no more than that offered by other frameworks. Castor was a project that was inactive for sometime. Recently, there has been a sharp rise of activity and involvement in this project. In fact, during the writing of this article, several dot releases were made available. So perhaps Castor is a project to watch even if you find results similar to those in this article.

Finally, writing your own JDBC code is still an option used by many a developer. With an abundance of persistence framework options, hopefully, this is out of design and not ignorance. Well-written SQL and JDBC code is hard to beat for performance; especially when objects are changing frequently. As I tell my students, JDBC coding isn't rocket science, but it is incredibly tedious. When you do run into a question or issue, there are a ton of helpful resources. But code maintenance and issues that a persistence framework manages (object state, association management, etc.) is why the persistence frameworks are so very attractive.