XML and BLU

December 24, 2013

As you probably noticed, DB2 10.5 features a new technology called BLU Acceleration which provides significant gains in performance, reduced storage consumption, and simplicity for analytical database workloads. To this end, BLU combines column-organized storage with novel compression techniques, hardware-specific optimizations, data skipping, new techniques for join processing and aggregations, and other innovations.

Since the release of BLU in June 2013, I was asked several times whether BLU can be used with XML or how BLU relates to XML in general.

The short answer is that a BLU table, i.e. a column-organized table, cannot contain columns of type BLOB, CLOB, or XML at this point in time. If you use BLU in a DB2 10.5 database, your XML documents can be stored in the same database but need to reside in traditional tables which are not column-organized.

Fundamentally, XML and BLU follow two very different approaches to managing data.

The concept of XML documents is to keep all attributes for a given business object (e.g. an order, a product, a contract) physically together as one entity so that you can easily retrieve or insert all the relevant information for an object in one shot. This is particularly useful for complex objects whose attributes would otherwise be spread over many relational tables. XML is also useful for objects whose set of attributes can evolve over time, since XML also offers schema flexibility. The main focus of using XML is on the business object as the unit of interest.

In contrast, BLU is designed to store all attributes separately from each other, purposefully breaking up the objects, even more so than traditional storage formats for relational tables. The focus of BLU is on accessing individual attributes across all or many objects as efficiently as possible. The column-organized storage purposefully exploits the fact that many analytical questions never need to see whole objects but only a (small) subset of the attributes. The unit of interest is the individual attribute, not the entire objects.

As such, XML and BLU serve different purposes and access patterns, both valid in their own right.

Analytical queries that need to scan, join, filter, aggregate, etc. on individual attributes across a large number of objects will perform much better with BLU than with XML.

Applications that often need to read or write entire objects, esp. complex objects, perform much better with XML than with BLU. Similarly, queries that typically access most or all columns of a relational table perform better on traditional row-organized tables than on column-organized tables.

Now what if you have subset of attributes in your XML documents that you want to run analytical queries on, ideally across all or many of the documents? In this case you can consider extracting some of these attributes for redundant storage in a BLU table. This can be done with triggers, stored procedures, or application logic.