Author

Disclaimer

The opinions shared here represent those of the contributor themselves and not those of their employers nor that of Big Men On Content as a whole.

XML Databases are not new. People have been working with the concept (if not the name) for a decade. One of the primary vendors in the space, MarkLogic, just announcedthey raised $12.5 m USD in series D funding. This is a notable event – not so much for the company but more for the technology. It shows that one of the most important communities – the VC market – believes XML databases have growth potential worth investing in in this very uncertain economy.

This ought to be enough to get you interested in exploring the technology if you haven’t already but there is more. I recently spent some time getting up close and personal with EMC Documentum’s xDB. (formerly known as xHive). I have to say I have seldom enjoyed getting my geek on more. It is an extremely powerful technology that you need to get familiar with because it will play an increasingly important role for content management technology in EMC and beyond.

Can’t You Just Use Documentum?

I’ve written beforehow I struggle with the tendency to dismiss content storage systems other than Documentum Content Server (C/S) with its gloriously rich API – that I already understand and have largely committed to memory. I suspect I am not alone in this. We have a phrase in the South (and elsewhere I think ) – “Dance with the one that brought you” – the content server has definitely taken us places and it is perfect for so many content management challenges. It is not perfect for everything – gasp!

As an architect – I need options and I owe it to the customer to align the requirements with the product choice. Just like App XTender, xDB expands the range of CM problems I can solve with the EMC catalog without performing unnatural acts with C/S . This is a much better position.

Those responsible for CMS architectures deal with two opposing forces. Reducing S/W Cost (cheaper tools) vs. Reducing Technology Footprint (fewer tools) With lower cost, smaller footprint options like xDB I can resist pressure to use the content server for simple problems where it is not a good fit. I can also avoid an even bigger problem – introduction of yet another vendor (and their lawyers, unique support requirements, etc) into an already crowded IT procurement landscape. (BTW – if you think lawyers don’t come with open source you’ve never worked for a company bigger than a roadside fruit stand)

xDB is more than just another addon – more than just another satellite orbiting in the content server system. xDB in addition to being an architectural component that you can leverage yourself is cropping up in more and more places as an embedded component of the other products in the catalog. The better you understand it – the better you understand the architecture of the entire suite.

So What is It Really?

xDB is a native XML Database. In simplest terms – a database that is designed to handle XML. Handle how? Store, Search, Transform, & Deliver. Notice that I used the term designed. Not optimized or enabled. From the ground up. There are certain things that make handling XML a true hybrid between file oriented CMS’s for unstructured content and structured data applications. As a result – a product designed around XML concepts provides more efficient and scalable performance.

XML is content but it has structure. DB people scoff because if they can’t shred it and load into a row/column pattern (normalizing, eliminating nulls in keys, etc) then it is unstructured data. I have argued for years – all content has structure – some of it is just harder to process than others. The problem arises when you arbitrarily apply relational theory to non-rectilinear data in XML table structures quickly become very difficult to optimize, performance suffers, storage requirements explode. Dogs and cats living together ….

The solution? leave it as a hierarchical structure. XML is by definition is self-describing data. Build the database around that structure not the other way around. The implementation is far from being that simplistic. This basic concept however – leverage XML’s self-describing and hierarchical nature to manage it – is the very foundation of an XML database. It does require you to change how you think about many kinds of data and get out of the JDBC shaped box you may be in.

No Barriers

Documentum has a tradition of being hard to get into from a technical level. Craig Randall did a very good job summarizing the EMC Developer Editionand highliting the XML Technology Developer Community. At long last even that wall is falling! The folks that manage xDB realize that the best way to get their message out is to give it away. They are making sure that you – the developers and architects have a chance to play with it and come up with new ways to use it.

I will point out when I first prepared for the xDB training – someone directed me to the WC3 specs for XProc and XQuery. Some people learn that way – I do not. I pass out from boredom before I get through the TOC. I want to see it work. What does it do? What good is it to me? What do others do with it? For those so inclined though – here is the best collection of specs on the topics I have seen.

What You Do With It – TODAY

xDB is already playing an extremely important part of the architecture for Documentum. It is an embedded component of many 3rd party products that I don’t think I can mention here but more to the point – it is becoming an integral part of the Documentum landscape. Here are just a few examples. (I could have more but the post is too long already)

XML Store– This is an additional license option for content server today. Once it is enabled you can create a content storage area of type ‘XML Store’. This area then is an instance of an xDB. The principle advantage here is that extremely large XML documents can be stored without chunking without a loss of performance. A very limited set of API’s are exposed through DFC but more capability today but more is coming through DFS in later releases.

Dynamic Delivery Services– Built on top of xDB, DDS is (in my words) an application server for assembling, transforming and consuming content from the web. SCS on steroids – ‘bundled’ with a set of GWT UI components to expedite application construction. Internet consumption of published content has always been a bit of a challenge. SCS would publish metadata to a target database but there was very little you could do with it after that in a high consumption volume environment. With DDS – dynamic transformation and assembly is more scalable and powerful with the introduction of xproc. This is by far the best framework for high volume consumption offered from the Documentum stack and it is xDB that makes it possible – providing not only the storage and metadata management but also node replication.

What’s in Your Future

Enterprise Search Server– This is a much talked about new component. Full text indexing used to be a feature of a CMS but now it has evolved to be one of the pillars of the architecture along with the database and and the content store. When Documentum dumped Verity for FAST several years ago a few things became immediately apparent. 1) you can add a tremendous amount of flexibilityty and capability by leveraging a separately managed and scalable search component. 2) FAST is harder than we thought to make run well. ESS is a combination of xDB and Lucene and is engineered as an option first but I believe everyone will move to it if for no other reason than the platform O/S support will be broader than the now MS owned FAST. The CPU and memory requirements of the xDB/Lucene combination promises to greatly reduce the hardware requirements.

What I Want to See

CMIS and xDB – When you get under the covers and look at the DSS implementation in particular it screams CMIS to me. The unofficial position I heard was that once CMIS is out of draft then it will be pursued. I suspect (my own opinion here) it is really about time. The CMIS implementation in work on content server is in many respects the more difficult problem to solve and there are only so many resources to go around that it doesn’t make sense to pursue both until its out of draft.

More xDB and Content Enabled Vertical Applications– I have no doubt this will come in time. The SMB market can’t always justify a full blown content server infrastructure but xDB affords tremendous capability. There are several ISV’s with offerings but the best example I can point to is the Documentum Technical Document Management System for S1000D (formerly XHive AMDS).

In working with the product for the last few weeks, the most significant take away is this. If you just think of xDB simply as backend component to publishing applications you are missing the boat. Some of the competitors have designed/marketed themselves into a corner here. xDB plus other components (e.g. DSS) are certainly capable of supporting this use case but xDB is a great option for many data persistence scenarios. Applications with data properly modeled as XML can leverage xDB and potentially eliminate the need for a database all together. It is designed to be embedded.

This post was not intended to get into the technical details but to point out that we serve the best interests of both our customers and our career by understanding this better. So whether you are a developer looking to expand your toolkit or a long time Documentum architect wanting to trully understand an emerging supporting technology – go to the XML Technology Developer Community and try it out.

Our company provides support for Geographic Information in XDB using the OGC/ISO Specification GML. This enables XDB to store GIS data, CAD data and sensor data. We are in process of extending this to support GML coverages as well. We have implemented a novel R-tree spatial indexing within XDB, support the OGC WFS web service interfaces and can automatically translate data with respect to a variety of coordinate reference systems. The product is called INscape and was previously called Cartalinea.