In my recent column on XML storage, I referenced a Microsoft-provided example of an inventory database. A retailer (I think an online one) wanted to manage books and DVDs and so on, and search across attributes that we common to the different entity kinds, such as title.

Obviously, there are relational alternatives. Items have unique SKU numbers, and they have one of a limited number of kinds, and a set of integrity constraints could mandate that an item was listed in the appropriate table for its kind and no other, and then common attributes could be search on via views that amounted to unions (or derived tables kept synchronized via their own integrity constraints).

I pushed back at Microsoft — which is, you may recall, not just an XML advocate but also one of the largest RDBMS vendors — with this kind of reasoning, and they responded with the following, which I just decided to (with permission) post verbatim.

“If all you ever do is manage books and DVDs, then managing them relationally works well, especially if their properties do not change. However, you many want to add CDs and MP3 on memory cards and many other items that all have different properties. Then you quickly run into an administration overhead and may not be able to keep up with your schema evolution (and you need an additional DBA for managing the complex relational schema). Even if you use a relational approach that stores common properties in joint tables, the recomposition costs of the information for one item may become too expensive to bear.”

Comments

Ugh. It really appalls me that MS has such an impressive stable of outstanding researchers in logic, language, DBs, and other subjects, and yet continues to promulgate nonsense like this.

Then you quickly run into an administration overhead and may not be able to keep up with your schema evolution (and you need an additional DBA for managing the complex relational schema).

I like how “design” is taken for “administration overhead”; yes, one actually does have to think about what one is designing.

Even if you use a relational approach that stores common properties in joint tables, the recomposition costs of the information for one item may become too expensive to bear.

Perhaps, although the word ‘may’ is telling, but that falls into the realm of caching implementation – either in the DBMS, app server, web server, or a custom strategy. The “custom recomposition” they’re advocating takes a toll as well, on code correctness and consistency, and ultimately performance in other areas (when you are summarizing items, for example, and pull back all the data on each one because you store the item as a whole).

Either you let your fingers get ahead of your brain, or I don’t understand your point on custom recomposition at all. How does the use of XQuery equate to “custom recomposition”?

And the last part of your comment also makes no sense to me, because it seems to assume that one has to pull back all the data on an item to get one specific node (i.e., field) of data. And that’s emphatically not the case when XML storage is native, which is one of the big reasons I’ve written a column in Computerworld about native XML storage.

Eric on
February 3rd, 2006 9:58 am

Either you let your fingers get ahead of your brain

Yes, sorry, it looks like I rushed.

How does the use of XQuery equate to “custom recomposition”?

I should have called it “custom DEcomposition”, since typically the XPath/XQuery will be evaluated relative to the root/document element; rather than composing complex predicates from simple ones (relations), you’re extracting some meaningful sub-predicate implied by a complex one (an XML document’s root node). In both coding and DB design, this spells trouble to me; it’s always easier to compose molecules from atoms rather than extract atoms, and it means that you’ll be forced to make design choices up front (when you create your XML “data model”) that are more likely to conflict with the processes you later develop.