InfoQ caught up with Emmanuel Bernard to talk to him about Hibernate OGM and which NoSQL backing stores they planned on supporting. Emmanuel said they started with Infinispan as the team was able to collaborate readily, but plan to support others. Infinispan and Hibernate OGM are both JBoss initiatives. Infinispan's was a good first choice because its transactional model is close enough to relational databases to bridge JPA readily.

Hibernate OGM is nascent, but they plan on supporting other NoSQL implementations. For example, the EhCache team plans on being a Hibernate OGM provider, and is working with the Hibernate OGM team to provide enhancements and abstractions to Hibernate OGM as needed. Emmanuel mentioned that there is interest around contributions for MongoDB, CouchDB and Redis. He hopes support for these will get under way soon. He hopes other projects and individuals will pitch in to support key/value stores, document oriented databases, column oriented databases and graph oriented databases.

Like Infinispan, Cassandra and Voldemort, let you store Apache Lucene indexes in the datastore. The Hibernate OGM JP-QL engine implementation relies on Lucene and Hibernate Search so those might be a natural choice for Hibernate OGM early adoption as well. The NoSQL stores that do not have such support will need to use a different strategy to implement querying.

InfoQ: how hard will it be for a developer who is familiar with JPA and MySQL to use HibernateOGM and Infinispan? What is the learning curve going to be?

It is very easy! And that was our goal.

The programmatic model and the semantic are literally the same. We are not talking about a JPA-like API. Hibernate OGM is a full-blown JPA engine. We already support most of the mapping constructs and CRUD operations (including entity hierarchy, associations etc). If you need to convert a JPA application using Hibernate Core into one using Hibernate OGM, you need to do the following steps:

add Hibernate OGM jar and its dependencies in your application

edit your persistence.xml and change the provider to Hibernate OGM, remove JDBC-like properties (like the JDBC driver, the dialect the schema generation) and add a pointer to the Infinispan configuration file

That's it. The limiting factor today is JP-QL. Alpha 2 does not have JP-QL support yet and the next version (say alpha3) will have support for simple JP-QL queries. However, if you are familiar with Hibernate Search, you can already use full-text queries (all of it).

The cool part of the project is that we reuse most of Hibernate Core for JPA's CRUD support and this gives us a huge engine maturity. Don't expect weird JPA related bugs or at least expect them in Hibernate Core as well.

InfoQ: When should you use JPA/RDBMS and when should you use JPA/NoSQL? Is there a cookbook or a particular set of use cases where one makes more sense than the other?

I won't pretend to know the answer to this. Frankly, the industry at large is trying to figure this out. That being said let me try and make a fool of myself. First of all, if relational database gives you satisfaction in your project(s), stick with it by all means. NoSQL is a set of very different tools that cover situations where current relational database engines fall short. Graph oriented DBs shine for any graph related queries (give me the friends of my friends of my friends who live in Paris). Data Grids for examples are used when low latency and transactionality are key and when data size is not humongous. BigTable clones are good when size matter (i.e. a lot of entries amounting for a lot of data).

Emmanuel added that using the JPA programmatic model will be orthogonal to the choice of NoSQL solutions. JPA does not fit all NoSQL use cases. Applications that use domain models will work nicely with Hibernate OGM. An obvious use case for JPA/NoSQL is to take load off of relational databases. Hibernate OGM is a success if it encourage developers to try and explore NoSQL solutions.

InfoQ: What are some simple queries that might be supported in the short term? Given the mismatch of the relational model to most NoSQL stores how much of JPA can you support?

The mismatch between traditional relational databases and NoSQL stores will show up I think in two big areas:

in the transaction and recovery model

in how associated data is stored (and subsequently accessed)

For the transaction differences, I think Hibernate OGM should not try and mask these differences but rather embrace the underlying transactional model and make the user aware of it. Doing otherwise would be a mistake as it would denature each NoSQL solution.

He went on to say that dehydrated entities and associations will probably be different depending on the underlying datastore family. He thinks JPA's relationship model will fit many NoSQL stores quite naturally. The mismatch eluded to in the question is not really a problem in JPA because JPA can have two associated entities with independent lifecycles and it can have embeddable objects or even collections of embedded objects which is like a document oriented model. In Hibernate OGM, the schema is hosted by the domain model and can be decorrelated from the actual object structure as to fit schemaless datastores.

I think the JPA pains in Google App Engine are a mix of BigTable storage limitations, query engine ones and time constraint on the GAE/J team. The guys behind GAE/J are pretty smart and quite a small team for what they have accomplished and they cannot be on all fronts.

In Hibernate OGM (so far), you will more see performance limitations (i.e. excessive key lookups in some situations) than non supported features. Of course, our initial support for JPA-QL is far from complete and people will have to bear with us for a little while. Our goal is to be natural to JPA developers. That being said, we won't be able to run contrary to the underlying NoSQL engine ability and strengths.

I won't go into details because I don't know the product well but if I had to summarize:

it's only for GemFire thus not NoSQL (or even data grid) agnostic

it's not open source ( end of the story ;) )

When we worked on Hibernate OGM and Infinispan we played with the idea of supporting JDBC as well. We might do it down the road but I see JPA and a more interesting level of abstraction (associated entities vs embeddable objects, etc). You can see Hibernate OGM as a denormalization engine while keeping duplicated pieces of data consistent for you. That's a huge win to best tailor and optimize your data access patterns. We will offer some declarative ways to denormalize your data. This is something you cannot achieve that naturally at the relational layer.

Since a lot of developers use Hibernate and JPA, NoSQL support seems like a natural addition to the Hibernate framework. Hibernate OGM could further drive the adoptoin of NoSQL by unifying the interface to NoSQL implementations, but the question will be how well does object mapping and JPA-QL translate to the various NoSQL implementations.

For those of us that have products that are built with Hibernate the advent of OGM is ideal. In particular the product we have has a number of entities whose population is an order of magnitude higher than others. We have done load testing that has made us concerned about how our product will perform with the available SQL databases.

MongoDB seems to be a solution to our problem (initial testing has yielded great results). Being able to add support for MongoDB to our product, while still using Hibernate for all entities, is the best solution for us.

And DataNucleus has standardised persistence (JDO/JPA) to MongoDB, HBase, Cassandra, ODF, Excel, db40, NeoDatis, BigTable, XML, JSON, ... as well as the majority of RDBMS. It's had these for some time.

Good that the Hibernate team have recognised that RDBMS is not what all projects need, and a datastore-agnostic API is actually the sensible route ...

Some projects of Spring Data are closer to Hibernate OGM than others (Spring MongoDB, Spring Data-JPA, Spring Data key/value) but the philosophy is different:* they encapsulate APIs in templates and abstractions to ease your life. This is a giant helper framework in a way. (I'm not saying that in a pejorative way).* Hibernate OGM is a full blown JPA engine (same semantic, same API, same query language, same everything)

In Spring Data, the two approaches I like most are:* the query API abstraction: very simple* the idea of storing one entity into several back ends and be the denormalization engine. Note sure it's useful in practice yet but it's definitively an idea worth exploring.

And frankly, this is not the time to compete. Now is the time to offer solutions for people to try NoSQL approaches more easily.

I just wanted to quickly state why the Spring Data team has explicitly decided not to use JPA as the central API to provide access to NOSQL stores.

1. Meta model - JPA exposes some annotation deeply related to relational concepts that surely create confusion when being applied to NOSQL stores: @Table, @Column, @Join* etc. Of course one might argue that one doesn't have to use these annotations but what you're actually doing then is creating a profile of the JPA you suddenly have to tell your users about. JPA on GAE is a neat example. Just read up on all the limitations you face there. I'd argue that this has got nothing to do with JPA anymore.

2. Store specifics - on the other hand you will have to introduce additional annotations to feature the store specifics. People choose NOSQL stores *because* of their specifics and we think a Java programming model for those stores has to hand those specifics to the users. Even for things common for a variety of stores (e.g. indexing) the annotations effectively have to vary as various stores offer different options to tweak things here. What about all the fancy features the NOSQL stores ship with: geospatial queries in MongoDB, graph traversals for Neo4J? Bottom line for us: NOSQL stores only make sense if you can leverage their specifics.

3. Transactions - as some stores do not provide transactions at all but have simply different approaches to consistency (e.g. MongoDB) we think users would be very surprised using an EntityManager creating a transaction and actually not creating one. EntityManager.persist(…) is required to throw a TransactionRequiredException if there's no transaction running. So of course you can simply not throw that exception for Mongo but would effectively create a JPA profile with that.

So our approach is essentially the template one Spring users know for ages already. We combine that with very store specific mapping metadata that has a very small core and allows non-Java NOSQL to easily approach the API as well (e.g. MongoDB developers are used to define indexes in JSON style and can do so with Spring Data). On top of that we provide the repository approach that can leverage the different flavors of meta-data to derive store specific queries from method names (JPA, Mongo, Neo4J etc). So we're not trying to abstract away things to a very narrow common denominator but rather provide a consistent programming model for each store leveraging the specialties of each of them.

As a very bonus we have our feet in polyglot persistence which essentially means that we store parts of the domain model into a relational store and other parts into a NOSQL one. I agree with Emmanuel that we have to see in how esp. this is going to fly.

Generally I (and that's my very personal opinion) think that choice has never been a bad thing for developers. Java and NOSQL have had quite a brief relationship so far so that there's lots of new ground to explore and a variety of approaches to try.

InfoQ nfoQ.com is practitioner-driven community service whose purpose is to facilitate the spread of knowledge and innovation in enterprise software development. As a member of the community, would you mind sharing a name with us?

And DataNucleus has standardised persistence (JDO/JPA) to MongoDB, HBase, Cassandra, ODF, Excel, db40, NeoDatis, BigTable, XML, JSON, ... as well as the majority of RDBMS. It's had these for some time.

Wow. That is a lot of support for a lot of different persistent stores. However, I have never run into a shop that has used DataNucleus in production. How do you think DataNucleus can gain market share? Is it a priority?

I think this is great. I have used DataNucleus support for BigTable, and it felt like there was a lot of room for improvement. This is where choice is good. It will be good to see more people involved in this API space. The nice thing about Hibernate getting involved is they are the 800 gorilla in object to persistent store mapping.

On the one hand, I would like to see some viable alternatives to Hibernate (DataNucleus, EclipseLink, etc.). The probability of needing to work with Hibernate on a project is very likely as it is so prevalent. At least now there is a popular common interface (JPA) so you can switch back and forth more readily from EclipseLink, DataNucleus et al.

I even like that there is Spring Data as it is good to think outside of the JPA box completely. Although, I am more likely to use a solution that uses the JPA interface where possible, as I just need a few less things to learn in my life. Although I understand Ollie's points.

RE:

As a very bonus we have our feet in polyglot persistence which essentially means that we store parts of the domain model into a relational store and other parts into a NOSQL one. I agree with Emmanuel that we have to see in how esp. this is going to fly.

Well that sounds rather cool.

RE:

Generally I (and that's my very personal opinion) think that choice has never been a bad thing for developers. Java and NOSQL have had quite a brief relationship so far so that there's lots of new ground to explore and a variety of approaches to try.