September 28, 2010

Let me start this topic with a couple of disclaimers. First, using XML as a storage format is not always the right choice – there are still valid cases where it can make sense to convert XML to relational format. Second, there are more than five reasons for storing XML in a database. Here I am just listing five reasons –in no particular order- that seem to be quite common.

1. When business records are represented as XML when they are processed or transmitted by applications.

When applications have already decided to use XML as the data format of choice, e.g. in a service-oriented architecture (SOA), and are transmitting and/or processing business data in XML format anyway, then it is often a good idea to also store the business data in the same XML format. If the applications use XML anyway, it is typically beneficial to avoid the overhead of disassembling XML just for storage, and to avoid reconstructing the XML when applications need to read data.

2. When the data format (schema) changes over time.

Let’s face it: schema changes in a relational database can be a pain in the neck. Just imagine a 1-to-1 relationship that evolves to a 1-to-many relationship. For example, assume that customer records are no longer restricted to one telephone number but can contain multiple phone numbers. Following traditional relational database design principles, you would now have to normalize your existing tables, i.e. introduce a new table for phone numbers with a 1-to-many join relationship from the customer table. This is a massive change. Any SQL statements that retrieve phone numbers or have search conditions on phone numbers need to be changed. However, if the customer records are stored as XML documents in an XML column, then the table definition does not need to be changed to accommodate the evolution from 1-to-1 to a 1-to-many relationship. An XML element “phone” would simply be allowed to occur multiple times per document. An XML index on these phone elements would also remain unchanged. With XML, the impact of schema evolution is (not zero but) much lower than with a relational database schema. This enables applications to react to changing business requirements faster and at a lower cost.

3. When the data format (schema) is complex and highly variable.

In today’s world, business record are often very complex, typically more complex than 50 years ago. Think of financial trades, electronic medical records, insurance contracts and claims, or rich content such as office documents. The complexity and variability and the number of optional data fields in such business records can be staggering. Designing a relational schema for such complex objects can be very challenging and can lead to dozens or even hundreds of tables that are difficult to understand from an application perspective. In such cases, storing each business record as a separate XML document can often simplify both database design and application development.

4. When you need to optimize object-centric data access.

Continuing the previous thought, what happens when complex business objects (trades, medical records, insurance claims, etc.) are mapped to and stored in a complex and normalized relational database schema? Storing a single business object may require SQL inserts into many tables, and reading a single business object may require complex multi-way joins and/or a sequence of many SQL queries to read all pieces of an object. In such cases, experience has shown that inserting or retrieving a single XML document is significantly more efficient. For such object-centric data access, XML can outperform a relational solution.

5. When you need the best of both worlds.

The world is not black and white. In particular, many application scenarios are not either 100% XML or 100% relational. Often, some data is better represented in one format while other is better represented in another. This makes a hybrid database design very appealing, since it allows you to manage both XML documents and relational data in the same database and even in the same table. You can than query across XML documents and relational data at the same time, even in a single query. For example, you can keep the fixed and structured parts of a record in relational columns and the variable or semi-structured parts of the same record in an XML column.

September 14, 2010

It’s been a little bit over one year now since we published the DB2 pureXML Cookbook. Many copies have been sold and we have received a lot of positive feedback from many readers (thanks!).

When you write a book with more than 700 coded examples of SQL/XML, XQuery, updates, XML Schemas, indexes, stored procedures, triggers, and application code samples in Java, C, COBOL, etc., it’s inevitable that some errors creep in and remain undetected before the book goes into print.

Luckily we have some very observant readers that are using this book in their DB2 pureXML projects and are paying a lot of attention to detail. They have discovered several mistakes in the book. Fortunately, most of these errata are quite small and syntax-related glitches rather than major disasters.

The errata that we currently know about are listed below, and we will continue to list any further errata on the web page athttp://tinyurl.com/pureXML.

By the way, IBM Press has a promotion where this and other books are currently available at a 35% discount.

Errata:

Page 216/217:
On the bottom of page 216, in Figure 8.41, 3rd line: “AS id” should be “AS idstatus” to match the query output on the top of page 217.

Page 283:
If we want the query in Figure 10.24 to return the exact same result as the query in Figure 10.23, two attribute names would have to be changed in Figure 10.24. That is, it should be

XMLATTRIBUTES(po.poid as “oid”),

instead of:

XMLATTRIBUTES(po.poid as “id”),

and

XMLATTRIBUTES(promoprice as “promoprice”),

instead of:

XMLATTRIBUTES(promoprice as “lowprice”),

Page 370:
The two queries in Figure 13.12 can be improved. The first query should cast to Varchar(500) instead of Varchar(50), to be on the safe side in case some titles are long. The second query in Figure 13.12 should cast the string-length to INTEGER instead of Varchar(500).

Page 388:
In Table 13.6 and 13.7 the paths for the attribute “@isbn” are wrong. The @isbn attribute is an attribute of the book element, not of the title element. So the correct path is /book/@isbn .

Page 637:
In Figure 21.15, the parameter index in the 2nd SQLBindParameter call should be 2, not 1. So, the bind call should be