Sunday, February 27, 2011

It's been a long time since I presented some thing about the greatest graph database out there, and with good reason. A lot of things happened, the most important of which is that I officially joined the crew behind Neo4j. After helping out with the short strings patch, I went back to more familiar territory, providing support for external transaction managers, this time with integration though Spring. So, in this first of two parts I will show you how to get Neo to work in Spring, in standalone mode, with your external transaction manager of choice. The second part will throw a JDBC DataSource in the mix and provide some guidelines on how to perform crash recovery. So, of we go.

As always, some ground rules

This post will focus on the integration of the three frameworks, and as such you should be already familiar with them. Also, you will not see a full blown application here - the main artifacts are JUnit test cases that demonstrate use of the API. With that said, you can go and take a look at this post where the pluggable transaction manager solution for Neo is introduced and you can now find in the official distribution since version 1.3.M01. We will cover some of that here as well however. All setup that follows is Maven based, so classpaths and dependencies are automatically taken care of.

What is the problem that we are trying to solve again?

Say you want to use Neo4j from within the Spring framework, either through Spring Data Graph or as a raw component. Since Neo4j cannot work without explicit transaction boundaries there has to be a way to start, commit and rollback transactions through Spring. If this is your use case fear not, because that is already taken care of, with very clear cut instructions here. This is the simplest use case and it is all you need to get things working with full ACID guarantees, provided that the only participating resource is Neo4j. However, if you need distributed transactions in an XA setting, then you have to take the steps outlined below.

First things first

To better demonstrate the setup and provide you with a starting configuration for your projects, there is a github repository available with an already configured environment. So point your git at Neo4j-Spring-Integration and clone. Usage is demonstrated through JUnit test cases and they are ready to run, with some additional legwork needed if you want Spring's own transaction manager implementation, described below. Note also that the latest Neo4j SNAPSHOT is required, the latest milestone as of this writing (1.3.M03) misses some needed functionality. So, let's go through the various components, in order of significance.

Enter the players: The txModule

In the txModule project you can find an implementation of a @Service module that is needed to enable support for Spring managed transactions in Neo. Since Neo is a fully transactional, XA compliant data source, it is mandatory to work under the supervision of a Transaction Manager even in standalone mode. Actually, even when using "just" Neo4j and you do some indexing, you have full 2PC happening under the hood with a Neo XAResource and a Lucene XAResource. But that is another story. Therefore, if we want the kernel to participate in 2PC with other XA resources (a JDBC connection, for instance) we have to cut out the internal transaction manager and delegate all calls to the externally provided one. The aforementioned module does exactly that, if enabled with a configuration parameter passed in the EmbeddedGraphDatabase constructor. So, mvn install that and you should be ready for the next step. Note that as far as Neo4j is concerned, you are done. The rest is configuring Spring with an external TM and getting it to work, including exposing it to all the XA Resources of your project.

Choosing the Transaction Manager

Spring does not provide its own transaction manager, preferring instead to delegate to a backing implementation, usually in the form of a JNDI-available instance in an Application Server. This is not mandatory however and instead we will opt for something more elaborate - use a (JVM-local) instance of either Atomikos, JOTM or SpringSource's Transaction Manager, some of the most widely used standalone transaction manager implementations.For JOTM and Atomikos it is easy to get access to and it is already in the pom of the demo.Spring's TM do not provide a maven repository so you will have to download and install it yourself, a pretty easy task but I want to keep crossovers from the actual project contents to a minimum, so if you want the exact steps for the installation I will refer you to the top level README file.

The file sampleCode/src/test/resources/base-tx-test-context.xml contains the basic configuration that binds all things together. The EmbeddedGraphDatabase bean is configured here and the parameter tx_manager_impl is passed in the constructor with a value of spring-jta. Take a look at org.neo4j.jta.spring.SpringProvider. See what i did there? When Neo starts up and finds the @Service for the transaction manager provider with this name it will load it, bypassing the native implementation. The provider gets the Spring's JtaTransactionManager injected and delegates all calls there. Voila! transactions are now handled externally and the graph kernel is unaware of the situation. This in turns enables programmatic transaction demarcation from the Spring API, including the @Transactional annotation. Of course, all facilities of Spring will work, including Spring Graph Data, leading to a practically complete integration of Neo4j with Spring.Finally, the code

The demo project shows how to actually work with your TM of choice in its test cases. Most of the work is performed in BaseTMIntegrationTest.java, where all the tests are held. It expects subclasses to provide a valid configuration as a classpath XML file via getConfigName() and then extracts the jtaTransactionManager and dataSource beans for having programmatic control over transaction demarcation and dataSource management for JDBC calls. You will note that the test cases that work with the graph database do not have a graphDb.beginTx() as usual. Instead, the relevant call is tm.getUserTransaction.begin() and commit() or rollback(), as required.

Let's take a closer look at the implementation of the testIndexDependencies() test case:

Note that all work is done through interfaces, so the actual implementation can be pushed down to the concrete test case classes. First we get the UserTransaction object from the TM implementation and ask for a transaction to begin(). Now we begin using our Neo4j instance as if a transaction was started with graphDb.beginTx() - only now, if another XA resource was to run in the same thread (and thus, in the same tx context) it would register as an XAResource making this a full 2PC transaction. After some sanity checks we tx.commit(), a method call on the UserTransaction instance, committing the global tx. This triggers behind the scenes a 2PC commit, since there is the neostore and the lucene index XAResources in there. The next chunk of code does a read outside of a transaction, ensuring that the "reads-can-happen-ouside-of-a-transaction" feature of Neo4j still holds. A new transaction sees an addition to the index only. At the end a read back sanity check is performed but from within a tx.

If you have done EJB development you will note here that this is a widely used idiom. This is called bean-managed transactions and is a finer form of control over annotation-based transaction demarcation (or container-managed transactions). This is entirely on purpose and a natural consequence of using the TransactionManager interface.

The concrete test cases do two boring things. One is to provide the actual TM implementation via a proper spring config file. The other is to do any implementation specific testing and take any actions required for things to work - the notable example here is NativeTMIntegrationTest.recover() which calls the recover() method, a required step for normal work to start in the case of Spring's TM.

That was it

Not much to write because it is that easy. With the txModule in place and the Spring abstractions on top it is very easy to plug in your tx manager of choice and have Neo4j participate in full 2PC. In the sample project there is already some groundwork to show how to also get JDBC calls to work with our setup - they require a mySQL database running though. Feel free to experiment with this part also, but wait a bit for an official walkthrough. In the next post we will see how to get the @Ignore tests to work and what is required for recovery in the case of a system crash. All the above would have been far more difficult to create without the help of Michael Hunger, a fellow coder at Neo Technology with knowledge of the ins and outs of both Neo4j and Spring.