SQL Server I/O: Creating XML Output

My grandpa used to say, "Life is short. Eat dessert first."
I'll take that let's-do-the-easy-stuff-first approach in this
week's installment of my XML series on SQL Server Input and Output.

It's fairly simple to get XML data output from a query in SQL Server. In
fact, every major database vendor now supports XML output of some sort. The hard
part is inserting XML data into a database from an XML document. So
lets eat desert first; we'll cover inserts in next week's article.

You can make any database that supports ANSI SQL create XML
documents. Here's a simple script that would create an XML document from
the pubs database in SQL Server:

This script isn't completely correct, as some XML parsing engines will
bark at the xml directive tag at the top of the output. The proper way to render
the line is actually:

<?xml version ='1.0' encoding = 'UTF-8'?>

You might want to escape out the double quotes with whatever the syntax of
your platform requires if you use this method. You might also run into trouble
with the CHAR(13) part I have here, so use whatever your platform needs to
create a line feed. To be technical about it, you don't even need the
linefeeds anyway; the whitespace is ignored.

You'll notice that I formatted this output in a very
"element-centric" fashion. That means that the columns are broken out
as an element for each heading. That's my preference in this situation;
someone else might require something different. In XML, elements can repeat, and
attributes can't. For instance, this is legal:

In this case, I reserved attributes as meta-data, and elements as column
headings. It's completely acceptable to make the same output as
attribute-centric instead, creating an element that essentially contains a row
of data, like this:

<Author id="123-45-6789" FirstName="Buck" LastName="Woody"/>

The advantage of this approach is that the document is smaller, and the XML
parsing engine doesn't have to move down a tree level (called a
"node"), keeping track of where it is in the structure. Attributes can
also be in any order, and don't need to be "nested" like elements
are. Again, the situation will dictate your design choices.

Let's get back to the process for creating the XML. Why did Microsoft
bother creating an engine in SQL Server to handle XML creation, if you can just
hard-code some strings to do the same thing? For one thing, hard-coding requires
touching the code each time the database structure changes. Microsoft includes a
few ways to get data out of a database into an XML schema. You can select data
with an extension to T-SQL, or use a Web interface to talk with SQL Server to
create the documents.

Most T-SQL Statements can create an XML document with a simple modifier,
called FOR XML. Here's a simple script to get
the same type of XML out of the pubs database:

You might notice a couple of things right away about this output. First, it
seems to violate the syntax rules I mentioned in the last two articles, since
there are no beginning and closing tags. There's just one tag, and
it's terminated at the end. That's actually OK for "empty"
elements, that is, elements with no data in them.

Also, the output is not element-centric but attribute-centric. The tag
repeats over and over, with the attributes completely inside in each tag. The
qualifier on the query that created the XML is FOR XML AUTO. (There are
other qualifiers to create the data as well, such as FOR XML RAW, which
creates a "row" tag, and FOR XML EXPLICIT, which allows you
to specify the "shape" of the XML data. We look at those two
qualifiers later.)

The FOR XML AUTO qualifier has options to help us specify the XML
output even further. The order of the SELECT statement creates the nesting of
the XML document. This nesting refers to which element (or node) is the
"parent" and which are the "children."

If what we're after is element-centric output, the FOR XML AUTO
qualifier has an option to do that:

I added the linefeeds and indents for readability. SQL Server returns it as a
single line.

As I mentioned, you can use most any T-SQL query to create the XML
document, but there are some pretty serious limitations. For one, you can't
use any aggregate functions in the query when you use FOR XML. This
makes sense, because it's difficult to think of a result set like that as a
tree. And that's the crux – it's important to keep in mind all
the time that what you're really doing is mapping relational data to a
hierarchical structure. Keep that concept in mind when you build your
queries.

Another limitation is that you can't use a GROUP BY clause in
the SELECT. You also can't use them in a subselect (which only
makes sense), in a cursor, or in a view. You can use a view to select from. In
other words, this won't work: