XQuery/Typeswitch Transformations

You have an XML document that you want to transform into a different format of XML. You want to control and customize the transformation process, and you want a modular way to store the transformation rules so that you or others can easily modify and maintain them.

You may have heard the conventional wisdom that "XQuery is best for querying or selecting XML, and XSLT is best for transforming it." In reality, both methods are capable of transforming XML. Despite XSLT's somewhat longer history and larger install base, the "XQuery typeswitch" method of transforming XML provides numerous advantages. These are covered in more detail in XQuery Benefits.

We will use XQuery's typeswitch expression to transform an XML document from one form into another. The basic approach is simple and straightforward: For each XML node in the input document, we will specify what should be created in the output document. The typeswitch expression performs this core function of identifying what happens to each node in the source document. We will write an XQuery function that takes a node, tests it using a typeswitch expression, and dispatches that node to the appropriate handler function, which transforms the node into the new format and sends any child elements back to the main function using the passthru function. This recursive routine effectively crawls through an entire node and its children, transforming them into the target format. Once the structure has been set up, the transform is easy to modify, even if there is very complex nesting of the tags within the input document. (The tail recursion technique will be familiar to discerning users of XSLT, but there is absolutely no XSLT prerequisite for this article.)

The most effective way to use the typeswitch expression to transform XML is to create a series of XQuery functions. In this way, we can cleanly separate the major actions of the transformation into modular functions. (In fact, the library of functions can be saved into an XQuery library module, which can then be reused by other XQueries.) The "magic" of this typeswitch-style transformation is that once you understand the basic pattern and structure of the functions, you can adapt them to your own data. You'll find that the structure is so modular and straightforward that it's even possible to teach others the basics of the pattern in a short period of time and empower them to maintain and update the transformation rules themselves.

The first function in our module is where the typeswitch expression is located. This function is conventionally called the "dispatch" function:

Notice that the typeswitch expression tests the input node against a list of criteria: is the node a text node, a comment node, a bill element, or a btitle element, or a section-id element, etc? If it's a text node (e.g. "This is the Bill title"), we simply return the text, unmodified. (Note that the text() node test comes first since text() is likely to be the single most plentiful node type in a text-rich document, and placing the most common type first improves performance.) If instead the node is a bill element, then we pass the node to the aptly-named local:bill() function for bill-specific handling. The local:bill() function (see below) turns the <bill> element into a <Bill> element. It then passes the contents of the bill element to the local:passthru() function. If our node doesn't match any of the pre-defined rules, then the typeswitch expressions resorts to the required final "default" (think: "fallback") statement; this default is used for all nodes that don't match any of the preceding tests. In our example, the default expression sends nodes without matches to the local:passthru() function. (Typeswitch isn't limited to matching text() and element() nodes; it can also match other the node types: processor-instruction() and comment(), but not typically attribute(). Attributes are conventionally dealt with inside the handler function of the attribute's parent element, rather than in the core typeswitch function.)

(*Note: This is such a simple function that it may appear extraneous. Why not simply replace instances of local:passthru($node) with local:dispatch($node/node())? Its primary benefit is that it simplifies the code, relieving you of the burden of typing an extra "/node()" for each recursion. A secondary benefit is that it introduces the possibility of filtering a node before it is sent to the typeswitch routine.)

The above local:passthru() function will remove all attributes from your nodes. If you have attributes in your input XML which you would like to retain, use the following passthru() function as an alternative.

We can now write a query that takes the source XML and uses the local:dispatch() function to transform the input into the target format:

let $input :=
<bill><!-- This is a XML comment --><btitle>This is the Bill title</btitle><section-id>1</section-id><bill-text>This is the text with <strike>many</strike> examples.</bill-text></bill>
return
local:dispatch($input)

Besides the fact that this function is entirely self-contained (beginning with a FLWOR expression and using $node/node() to recurse through child nodes), notice that the function uses computed element constructors to accomplish the transformation.

This is the heart of the XQuery Typeswitch approach to XML document transformation. On the basis of this simple pattern, entire libraries have been written to transform source formats like TEI, DocBook, and Office OpenXML documents into other formats like XHTML, XSL-FO, and each other.

While we can create typeswitch modules by hand, building them up element by element, we can also use XQuery to generate a skeleton typeswitch module; see this article's companion article, XQuery/Generating_Skeleton_Typeswitch_Transformation_Modules. In addition to the "skeleton generator", this article also provides examples of more complex transformation patterns with XQuery typeswitch: changing an element's name, ignoring an element, transforming differently based on the context of the element, reordering elements. It also provides a detailed comparison of XQuery and XSLT's approaches to the same example transformation, so it is useful for readers coming from the world of XSLT.