Featured Database Articles

Creating an XML table and XML-based indexes with the IBM Database add-ins for Visual Studio 2005

In a previous series of articles, I showed you the integration features between the IBM DB2 Universal Database for Linux, UNIX, and Windows Version 8.x (DB2 UDB) product and the Microsoft Visual Studio.NET 2003 integrated development environment (IDE).

In early June 2006, IBM announced the next release of the DB2 UDB product, DB2 9. Part of this announcement includes the support for Microsoft Visual Studio 2005 and its accompanying ADO.NET 2.0 driver.

As you may recall, Microsoft announced the Visual Studio 2005 product, along with SQL Server 2005, in late 2005. Around the same time, IBM delivered a beta of the now officially announced DB2 integration into Microsoft Visual Studio 2005 and ADO.NET 2.0.

I also recommend that you download the free DB2 Express-C product, which includes that pureXML feature, so you can follow the steps outlined in this article. You can download your own free no-database-size limit copy of DB2 9 (which also includes 64-bit support) at: http://www.ibm.com/db2/express.

As Ive been writing about the DB2 integration into Visual Studio 2005, Ive discussed the following topics in other articles:

Build a DB2 Bound Data Application using Visual Studio 2005Shows you an alternative method where you can even more quickly build a Windows Application using a data grid that binds to your DB2 data source. This approach is a little different from the first article in this list in that you dont specifically bind controls to an application: its even easier than that.

In this article, I want to show you how to create a table using an XML data type and also how to create an XML index on that table in preparation for subsequent queries.

Note: This article requires the use of an XML document, which is provided for download. The contents of this XML document are as follows (but feel free to use your own XML document or alter the data within this one):

Creating a table with the XML data type

There are many ways to create a table in DB2 9. In a previous article on the integration of DB2 data into Visual Studio 2005, I wrote about designers, a unique feature that you can use to create database schema objects. Although I didnt write about it at the time, the Visual Studio 2005 designers for DB2 9 fully support the XML data type  which should surprise no one since XML is treated like a first class data type in DB2 9.

In this section, you will use the table designer to create the CUSTOMERS table, whose definition looks like this:

To create this table using the table designer, perform the following steps:

Note: The steps in this section are not fully detailed because a previous article in this series detailed how to create a table using this approach.

1. Right-click the Tables folder and select Add New Table with Designer.

2. In the Table Definition section, enter CUSTOMERS in the Name field and optionally fill in any of the remaining fields to suit your environment.

3. Click the Add Column icon () in the Columns section to add the columns shown in the previous figure to the CUSTOMERS table. When you are finished, this section of the designer should look like this:

Note: Before proceeding to Step 4, double-check that the XML data type is still available from the Type drop-down list. In case they are not (and you are sure you've created a DB2 database that supports XML), simply drop and re-create the database connection.

4. Specify that the ID column is the primary key for this table by selecting this column in the Columns section, clicking the Primary Key drop-down box in the Column Properties section, and selecting True.

6. Refresh the Server Explorer view by right-clicking the Tables folder and selecting the Refresh option. Ensure that you can see the new CUSTOMERS table.

Creating an index on an XML column

Indexing support is available for data stored in XML columns. The use of indexes over XML data can improve the efficiency of queries issued against the XML documents or fragments stored in a DB2 9 pureXML column.

As with a relational index, an index over XML data indexes the contents of the column. They differ, however, in that a relational index indexes an entire column, while an index over XML data indexes parts of the column. (It can also index the entire column.)

For XML indexes, you indicate which parts of the XML document stored in the column should be indexed. You specify the parts of the XML document you want to index by specifying an XML pattern (which is essentially a limited XPath expression).

Youll also note the AS clause in the previous DDL, which specifies the data type to which indexed values are converted before they are stored. Values are converted to the index XML data type that corresponds to the specified index SQL data type, which helps the index manager perform fast and efficient searches on the XML data.

More specifically, the AS clause is required because the DB2 9 engine was built for flexibility, and one of those flexible options is not to require an associated XML Schema Definition document to store your XML documents. Without an XML Schema Definition document, there would be no way for DB2 9 to know the data type to use for the index for a specified XMLPATTERN expression. For example, you use AS SQL VARCHAR(x) for nodes to index values of a known maximum length, and AS SQL DATE and AS SQL TIMESTAMP for date-based nodes.

Note: The AS VARCHAR HASHED option is used in nodes for which you don't know the length, or for node whose lengths change frequently. This index will hash out the string values of your nodes. This may seem optimal, but these indexes won't support range predicate queries, just equality ones.

If you're indexing numeric data, you use the AS SQL DOUBLE clause. For simplicity, the DB2 9 technology offers this single numeric data type for indexing XML numeric-based data. The reason for this is simple: Instead of weighing down DBAs with the complexity of choosing between multiple numeric-based data types, it was deemed a better option to cast all numeric-based data into a DOUBLE data type. The consequence is that the DBA could lose precision with this data type, which would create the side effect of the inclusion of some elements that wouldn't otherwise be in the index if it were able to accommodate a more precise numeric value. The point here is that you'll always get the data you're looking for (and it'll be easier to define the structure to get it), though the index may contain more entries than needed for the defined index but your queries will filter out these results anyway and get the data fast: the whole point of an index.

The data definition language (DDL) to create an XML index in DB2 9 is shown below:

If you recall, the XML Schema Definition (XSD) document that we created and registered earlier in this series, defined the CanadianAddress data type as follows:

With this in mind, it may be beneficial to create an index on the POSTALCODE column or some attribute that you intend to search on. You can also fully index the XML document, including all attributes, elements, and nodes. For large documents, this isnt likely the best idea, but for small XML documents, it provides a very rich set of search capabilities.

To create an index on the contents of the MAILING_LABEL column, perform the following steps:

1. Right-click the Customers table and select Open Definition.

2. Click the XML Index icon ().

3. Click the Add Index button ().

4. Change the name of the index to FULLXMLINDEX.

5. In the Index Properties window, set the Column field to MAILING_LABEL and DataType to VARCHAR(255). When you have done this, the designer should look like this:

b. Select the Use document from file radio button, and enter the path to the XML file that was provided as a download in this article. Then click OK.

c. Expand the structure thats loaded into the XML Tree table such that the IDE looks like this:

You can see that the data in the provided XML file is shown in the tree view of the XML document. This will help you create the XML pattern that corresponds to the data you want to index.

For example, you could select the <PostalCode> element to index it. Notice that the XML pattern expression is built in the XML pattern field dynamically as you navigate the XML structure. Once you select the <PostalCode> element, this field should look like this:

If you wanted to index the actual text within an element, you would select the text node of the element. For example, if you select the data within the <PostalCode> element, the XML pattern would look like this:

Note the addition of the text() function in this expression.

For this simple example, select the entire document, as shown below:

d. Press Ctrl+S to build the FULLXMLINDEX index.

Note: If you selected other elements, attributes, or nodes, then you could create multiple XML indexes in a single step.

Of course, you may expect your applications to access the CUSTOMERS table via the ID relational column as well. In DB2 9, you can mix and match different access paths without consideration in your application.

To support these different access patterns, you can define a relational-based index on the CUSTOMERS table using the index section of the Table designer. (Note the Relational Index label in the Type field within the Indexes frame.)

Wrapping it Up...

In this article, I showed you how to create a table that leverages the power of the pureXML feature in DB2 9. After creating that table, in anticipation of a wide array of query patterns, we defined an XML index using an XML pattern expression on the pureXML column. At this point in this Visual Studio and DB2 9 pureXML series weve created an XML Schema document, registered it in DB2, and created a table with an XML column where we can store the sample XML document, and indexed it  all within the Visual Studio IDE. It gets better. From here, we can insert data into this table (and optionally validated it), create stored procedures and even Web Services that interact with the XML data  but youll have to wait for subsequent installments of this series to learn how to do that.

Paul C. Zikopoulos, BA, MBA, is an award-winning writer and speaker with the IBM Database Competitive Technology team. He has more than ten years of experience with DB2 and has written over sixty magazine articles and several books about it. Paul has co-authored the books: Information on Demand: Introduction to DB2 9 New Features, IBM DB2 9: New Features, DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA's Guide to Databases on Linux. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). In his spare time, he enjoys all sorts of sporting activities, running with his dog Chachi, and trying to figure out the world according to Chloë - his new daughter. You can reach him at: mailto:paulz_ibm@msn.com.

Trademarks

IBM, DB2, DB2 Universal Database, and pureXML are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both.

Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, and service names may be trademarks or service marks of others.

Copyright International Business Machines Corporation, 2006. All rights reserved.

Disclaimer

The opinions, solutions, and advice in this article are from the authors experiences and are not intended to represent official communication from IBM or an endorsement of any products listed within. Neither the author nor IBM is liable for any of the contents in this article. The accuracy of the information in this article is based on the authors knowledge at the time of writing.