Tech-sand is my moniker for technology that keeps us from delivering value...

Wednesday, August 29, 2012

Storing Documents in MarkLogic via XCC

Integrating Java applications to MarkLogic involves using the MarkLogic XCC (XML Contentbase Connector). XCC is a set of APIs that support Java, .Net, etc. XCC uses a MarkLogic XDBC server embedded in the MarkLogic server to connect to the XML databases on the MarkLogic server.

After the XDBC server is created in MarkLogic, you can test connectivity via a simple Hello World call, seen below. Note that I use the admin/admin credentials that I set up when I installed MarkLogic. In truth, MarkLogic has a very granular security scheme that can be used to control access and privileges. SSL should also be used. The Hello World example is from the XCC Developers Guide.

Once connectivity has been tested, you are ready to start storing documents. Another example from the XCC Guide (customized for my use) can be seen below. In this example, I again connect to the XDBC server that I created and get a session object. It is important to state here that connection pooling is done automatically for you by XCC. The API also supports JTA.

In this example I also use the recommended approach to retrying operations against the XDBC server. The code that does the "heavy-lifting" to store the documents is seen below. In this code, the XCC API uses the session object to make a request to the XDBC server with a new ad-hoc query that uses the xdmp:document-insert function which is a built-in markLogic XQuery function. In its simplest form the document-insert function takes a unique document URI and the XML document content. In this example the XML content is provided by a Groovy-String (G-String) in XmlContent.groovy class. I use Groovy string constants because G-Strings preclude me from having to write all that nasty java.lang.String concatenation.

To verify that the docs were stored, I will go out to the MarkLogic Query Console (http://localhost:8000/qconsole/). In the console, I can run XQuery queries to verify that I stored the documents in the database. Note: When MarkLogic installs, it creates several databases, and when you create an XDBC server, you must choose a database to connect to. I chose the "Documents" database, but I could have created a new one for this purpose. Below is a screenshot of the Query Console. I clicked on the "Explore" button to view a list of all the documents in this database.

If I wanted to view the contents of a document, I could run an XQuery as seen below, or I could also simply click on the document in the list.

The value in the XQuery doc function is the unique URI for the document in the MarkLogic database.
This was a simple example of storing documents in MarkLogic. In reality, considerable thought should be exercised to create the proper structures (Directories and Collections, etc.) that would be used to house and organize documents. Organizing documents into directories and collections makes them easier to handle en masse if that requirement exists. Another important point to make is that these docs were already XML. Going forward, I will be serializing Java objects into XML via XStream.
Before I can store Java objects as serialized XML, I need to map important attributes of my model objects to the MarkLogic container model. To do this I wrote a custom Java Annotation, seen below.

The Employee.java model class seen below uses the Document annotation to define the MarkLogic container specific semantics that would be used when the document representing the Java object is stored in MarkLogic.

The MarkLogicDao class does the heavy lifting and interfaces with the MarkLogic database. Its methods use the documentMap to get access to the metadata attached to the Employee objects via the Document annotation.

The purpose of the MarkLogicDao is to abstract the access layer and prototype the XDMP calls to the MarkLogic XCC API. The screen shot below shows the documents loaded by their unique URIs and their collections.

By clicking on the (properties) link, you can access the document properties metadata. These metadata are helpful when you want to process documents and keep track of which ones have been processed or other in process statuses. Below are the properties for one of the employee docs.

NEW2012-09-05T15:24:28-04:00

Going forward I will discuss the power behind XPath and XQuery embedded in MarkLogic.

5 comments:

Hi Jimmy, I wonder if you've had a chance to look at the new Java API in MarkLogic 6? I wrote a tutorial on it here: http://developer.marklogic.com/learn/java

It also integrates JAXB. I'd love to hear your thoughts on how using the new Java API would compare with the approach you took using XCC (before the new API was available). I look forward to reading more of your thoughts and insights about MarkLogic. I'm subscribed. :-)

Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java Online Training India . Nowadays Java has tons of job opportunities on various vertical industry.