ModeShape 5’s persistence changes

Starting with ModeShape 3 in early 2012, all repository nodes were internally represented using JSON documents and stored as BSON values in Infinispan. Although we relied upon some Infinispan features, for the most part ModeShape was merely storing its data using a very basic key-value API.

As ModeShape evolved through the 3.x and 4.x versions, we started having some data persistence issues that were largely outside of our control. ModeShape could be deployed inside JBoss AS (eventually known as Wildfly), so we chose our version of Infinispan based upon the version that was shipped with JBoss AS. Unfortunately, when we found bug in Infinispan, those bugs would be fixed in releases that were not yet included in JBoss AS, meaning we couldn’t get the fixes for quite some time. Using Infinispan also made the repository configuration and internals quite complex. Plus, changes in Infinispan’s persistence stores sometimes meant that persisted data could not be read by newer versions of Infinispan.

But most importantly, in certain situations we saw data corruption render a repository’s content largely unusable. This is a complicated issue that we previously outlined in detail this forum post and this issue.

Therefore, our primary goal with ModeShape 5 was to make sure that the repository data is stored in a more durable and strongly consistent manner that avoided the aforementioned corruption issues. This meant that we had to take a more conservative approach to persistence and give up claims of high scalability and performance (which are fine with eventual consistency, but not with strong consistency which is a must-have for ModeShape).

Already having the design of storing BSON documents in a key-value store helped us a lot, since it meant we only had to come up with transactional, strongly consistent, key-value store alternatives.

ModeShape 5’s initial release comes with three such stores out of the box.

1. RDBMS store

This was the obvious choice, since relational databases provide strong consistency guarantees with good transactional support, at least with READ_COMMITTED isolation level that ModeShape requires. Enterprise users still trust and use relational databases a great deal. Using a relational database store meant users can still cluster multiple repositories together, as long as all those repositories use the same shared database.

ModeShape 5 comes out-of-the-box with support for H2, Oracle, MySql and PostgreSQL. Repository data is persisted in the form of BLOBs using the same internal BSON format we’ve used since ModeShape 3. We’ve also designed the store in such a way so that in the future, we can add specialized storage types that take advantage of the capabilities of different databases (for example PostgresSQL’s JSONB in 9.5 and above).

Configuration is again much simpler than ModeShape 3 and 4 with an equivalent store, as can be seen from the documentation.

2. File system store

This store uses an embedded H2 database to persist information on the local disk. Internally we use its very nice MVStore API, the lower-level key-value engine used within H2’s normal relational and SQL engine. It provides good transactional support and stores/streams binary objects (like our BSON documents) with optional features like compression and encryption.

For users which don’t want to store the repository data in a RDBMS and who also aren’t interested in clustering, this should be the default go-to store in ModeShape 5.

Configuring such a store is trivial and doesn’t require any additional configuration files (see our documentation for examples)

3. Transient in-memory key-value store

This is the default store when nothing is explicitly configured, and data is only persisted in-memory and is lost as soon as the process stops. Therefore, this is not suitable for production but is a very simple and natural option for testing and exploration. Internally, it uses H2’s MVStore API without persistence.

Other key-value stores

We can add support for other key-values stores in the future, provided they:

are strongly consistent;

support ACID transactions; and

run on Java 8 or above

We’re also happy to hear any suggestions or to evaluate any contributions from our community members.

Performance

Our preliminary tests indicate that all the above stores perform at least as well as their previous Infinispan counterparts in local, non-clustered modes. In fact, they should perform better in write-intensive cases while probably performing slightly slower for read-intensive cases, since Infinispan always had an in-memory cache layer on top of every store.

Which should you use?

We recommend using the file store for non-clustered cases. It’s simple, fast, and doesn’t require an external process. A second option to consider is the JDBC store with an embedded H2 database.

When clustering however, the only suitable option is the relational store with a shared JDBC store. As outlined above and as mentioned in the documentation, strict serializability required by the JCR API comes at a cost: all cluster members must coordinate their operations and use a shared persistent store. To help provide this coordination and to avoid write-contention on the same nodes, ModeShape employs global cluster locking (via JGroups) to ensure nodes can only be modified by one cluster member at a time. We believe that this is only way in which we can ensure the JCR consistency requirements when running in a cluster.