Exploring XML - 23 May 2001

Why does Query Analyzer always truncate the output from a FOR XML query, regardless of whether you send the results to a file or the screen?

When SQL Server 2000 executes a FOR XML query, SQL Server returns a resultset that contains one text column and one or more rows. The column has a unique and unusual name: XML_F52E2B61-18A1-11d1-B105-00805F49916B (which I refer to here as the XML column). This special name lets the SQL Server OLE DB provider recognize the column as a stream of XML data; you can see this behavior easily in Query Analyzer by running a FOR XML query. SQL Server divides the XML result in the XML column into rows, where each row can contain up to 2033 characters. Constructing the XML stream from this XML resultset for each query would be very inconvenient. Thankfully, Microsoft extended both ADO and the SQL Server OLE DB provider to provide a stream of XML. The ADO and SQL Server OLE DB extensions let you return an XML query's results in an IStream or ADO Stream object instead of a standard ADO Recordset object or OLE DB rowset. ADO and OLE DB construct the stream by joining the rows of the XML column. (For details about the ADO and OLE DB XML extensions, see SQL Server Books Online—BOL.)

Now, let's look at how SQL Server's use of the XML column to return an XML result affects Query Analyzer's behavior. Surprisingly, Query Analyzer doesn't use the ADO or OLE DB extensions when executing a FOR XML query; Query Analyzer uses ODBC. And because Microsoft hasn't updated the ODBC driver to be XML-aware, the ODBC driver treats the data like any other text column. If you retrieve the results to a file or screen, the concatenation of multiple rows in the resultset causes extra line breaks. In addition to extra line breaks, data truncation can occur.

Truncation can occur when Query Analyzer's Maximum Characters per Column setting contains a value smaller than 2033. To fix the truncation problem, you can change this setting from its default value of 256 characters in Query Analyzer, Tools, Options, Results. This change solves the data-truncation problem, but line breaks still occur in the concatenation of data from two rows. The only way to avoid generating the line breaks is to use ADO, OLE DB, or an HTTP query and the SQL Server Internet Server API (ISAPI) DLL to obtain an XML result from SQL Server.

Another useful but undocumented feature can help you read the XML result that Query Analyzer displays. SQL Server supports using trace tags to alter the behavior of certain functions. If you set trace tag 257 on, SQL Server invokes a "pretty-print" algorithm on the XML output before returning it to make the XML result more readable. You can turn on the trace tag with the following statement:

DBCC traceon(257)

I frequently use the trace tag when coding examples for this column. I think you'll find the trace tag useful for obtaining more readable XML results when you use Query Analyzer for experimenting and prototyping.

Can I import data from XML files into SQL Server 2000?

Microsoft provides three ways to update SQL Server 2000 by using data from XML files: OpenXML, Updategrams, and XML Bulk Load. OpenXML ships with SQL Server 2000. Updategrams and XML Bulk Load became available in mid-February 2001 when Microsoft shipped XML for SQL Server 2000 Web Release 1. Web Release 1 represents the first supported, production-quality update to the original SQL Server 2000 XML functionality; you can download Web Release 1 from the Microsoft Developer Network (MSDN) Web site at http://msdn.microsoft.com/code/sample.asp?url=/msdn-files/027/001/554/msdncompositedoc.xml. To load small amounts of XML data into SQL Server—as much as 1MB (although 100KB is more typical)—OpenXML and Updategrams are good choices. SQL Server Magazine has already published several articles illustrating OpenXML and Updategram functionality (e.g., Michael Otey, "XML Updategrams," January 2001; Michael Otey, "Top XML-Integration Features," April 2001). Let's focus now on how XML Bulk Load works.

Microsoft designed XML Bulk Load, which is similar in functionality to the bulk copy program (bcp) and T-SQL BULK INSERT commands, to load large amounts of XML data into a database. But unlike bcp and BULK INSERT, which accept only tabular data representations (i.e., a set of rows with a fixed set of columns), XML Bulk Load supports loading XML hierarchies into one or more database tables. And unlike OpenXML and Updategrams, XML Bulk Load uses the streaming Microsoft XML Parser (MSXML) to process data rather than parsing the entire XML dataset into memory before processing. Using the streaming interface lets XML Bulk Load process datasets larger than 100MB without running out of memory. Let's look at an example of how to use XML Bulk Load to import data into SQL Server.

If you've used SQL Server 2000's XML View functionality, you've created an annotated schema that provides a mapping between the XML data in your dataset and a set of the database tables in your database. (For more information about XML View, see SQL Server Books Online—BOL.) You can then use the annotated schema in conjunction with an XPath query to retrieve data in XML format from your database. You can use that same annotated schema with XML Bulk Load to import data into SQL Server. (If you haven't used annotated schemas but have used bcp, an annotated schema is similar in function to a bcp format file.)

Constructing an annotated schema is the first step required to perform an XML Bulk Load. Listing 1 shows an annotated schema that loads the data in Web Listing 1 (see the More on the Web box for download information) into the database table in Listing 2. Because this example uses a simple schema, I used my trusty expert interface—Notepad—to create it. But Microsoft has provided a user-friendly tool called the XML View Mapper for constructing more complex annotated schemas. You can obtain the XML View Mapper and its documentation at http://msdn.microsoft.com/code/sample.asp?url=/msdn-files/027/001/443/msdncompositedoc.xml.

Let's examine the annotated schema that Listing 1 shows. The attributes with the sql prefix specify which elements and attributes map to which tables and columns in the database. The sql:relation attribute maps the Company element in the XML document to rows in the database's Companies table. The sql:field attributes map the attributes' values to columns in the Companies table. And, the sql:is-constant attribute specifies that the Company element is a schema root element and not mapped to the database. This schema provides all the information the XML Bulk Load facility needs to load the data into the database. The next step is to use the XML Bulk Load Object Model to execute the Bulk Load.

To illustrate the use of the XML Bulk Load Object Model, let's use a Visual Basic (VB) project. Listing 3 shows a code sample from the project that performs the Bulk Load. (The complete VB project is available for download; for instructions, see the More on the Web box.) The code sets the connection string to point to my database (remember, the annotated schema specifies the tables and columns), to use the SQL Server OLE DB provider, and to set the appropriate parameters. The code also sets up a path for an error log. If XML Bulk Load encounters an error while loading the data to the database, Bulk Load will write an error to the error log. Then, to execute the Bulk Load, the code passes to the Execute command in the SQLXMLBulkLoad object the names of the files that contain the annotated schema and the data. Voila! The data is now in the database.

Although this example is oversimplified, you can see that XML Bulk Load is simple and easy to use. XML Bulk Load also has some other useful features that handle situations you can encounter when you load large datasets into SQL Server. These features include the abilities to control transactional semantics, handle IDENTITY columns for relationships, handle NULLs, and lock semantics, among others. XML Bulk Load even has an option to create the database tables before loading the data, based on the annotated schema. Overall, XML Bulk Load is a handy and powerful tool for importing XML data into your database.

From the Blogs

Don’t let bad data sneak up on you when and where you least expect it. Ferret out bad data with Melissa Data’s newest Profiling Component for SSIS. Learn how to take control of your data using knowledge-base-driven metadata. The truth shall set you free!...More

Now that we’ve outlined the process to let servers in a SQL Server AlwaysOn Availability Group "talk to each other" by means of setting up linked servers, it’s possible to set up some additional or improved checks on Availability Group Health....More

In my previous post, I provided a high-level outline of the core logic (and rationale behind that logic) that would be needed to set up regular synchronization checks on SQL Server Agent Jobs for servers where AlwaysOn Availability Groups have been deployed. In this post, I’ll walk through the steps--and the code--needed to setup those checks....More