Get off to a fast start with DB2 9 pureXML, Part 3: Query DB2 XML data with SQL

The IBM DB2 9 release features significant new
support for storing, managing, and querying XML data, which is called
pureXML. In this article, learn how to query data stored in XML columns
using SQL and SQL/XML. The next article in the series will illustrate how to
query XML data using XQuery, a new language supported by DB2.

Note: Originally written in 2006, this article has ben updated to
include product changes in DB2 9.5 and 9.7.

Although DB2's hybrid architecture represents a significant departure from
previous releases, exploiting its new XML capabilities doesn't have to be
a painful process. If you're already familiar with SQL, you can
immediately apply your skills to working with XML data stored natively in
DB2. See how in this article.

The XML features in DB2 9 include new storage management, indexing, and
query language support. In this article, learn how to query data in DB2
XML columns using SQL or SQL with XML extensions (SQL/XML). The next
article in the series will discuss DB2's new support for XQuery, an
emerging industry standard, and explore when it can be most useful.

You may be surprised to learn that DB2 also supports bilingual queries --
that is, queries that combine expressions from both SQL and XQuery. Which
language (or combination of languages) you should use depends on your
application needs, as well as your skills. Combining elements of two query
languages into one query isn't as tough as you may think. And doing so can
offer you powerful capabilities for searching and integrating traditional
SQL and XML data.

Sample database

The queries in this article will access the sample tables created in "Getting off to a fast start with DB2 9, Part 2" (developerWorks,
March 2006). As a quick review, the sample "items" and "clients" tables
are defined as follows:

Sample XML data included in the "items.comments" column is shown in Listing 2, while sample XML data included in the
"clients.contactinfo" column is shown in Listing
3. Subsequent query examples will reference specific elements in one
or both of these XML documents.

Query environment

All the queries in this article are designed to be issued interactively,
which you can do through the DB2 command line processor or the DB2 Command
Editor of the DB2 Control Center. The screen images and instructions in
this article focus on the latter. (IBM Data Studio and IBM Optim
Development Studio also ship with an Eclipse-based Developer Workbench
that can help programmers graphically construct queries. However, this
article does not discuss application development issues or the Development
Studio.)

To use the DB2 Command Editor, launch the Control Center and select
Tools > Command Editor. A window similar to Figure 1 will appear. Type your queries in the upper
pane, click on the green arrow in the upper left corner to run them, and
view your output in the lower pane or in the separate "Query results" tab.

Figure 1. The DB2 Command
Editor, which can be launched from the DB2 Control Center

SQL-only queries

Even if your knowledge of SQL is limited, you'll still be able to query XML
data with little effort. For example, the following query selects the full
contents of the "clients" table, including the XML information stored in
the "contactinfo" column:

Listing 4. Simple SELECT statement

select * from clients

Of course, you can write more selective SQL queries that incorporate
relational projection and restriction operations. The following query
retrieves the IDs, names, and contact information for all customers with a
"Gold" status. Note that "contactinfo" contains XML data, while the other
two columns do not:

Listing 5. Simple SELECT statement with projection and restriction

select id, name, contactinfo
from clients
where status = 'Gold'

And, as you might expect, you can create views based upon such queries, as
seen here with "goldview":

Listing 6. Creating a view that contains an XML column

Unfortunately, there's a lot you can't do with just SQL. Plain SQL
statements enable you to retrieve full XML documents (as you've just
seen), but you can't specify XML-based query predicates and you can't
retrieve partial XML documents or specific element values from an XML
document. In other words, you can't project, restrict, join, aggregate, or
order by fragments of XML documents using plain SQL. For example, you
can't retrieve just the email addresses of your Gold customers or the
names of clients who live in zip code "95116." To express these types of
queries, you need to use SQL with XML extensions (SQL/XML), XQuery, or a
combination of both.

The next section explores several fundamental features of SQL/XML. And in
a subsequent article, learn how to write XQuery as well as how to combine
XQuery with SQL.

SQL/XML queries

As the name implies, SQL/XML is designed to bridge between the SQL and XML
worlds. It evolved as part of the SQL standard effort and now includes
specifications for embedding XQuery or XPath expressions within SQL
statements. XPath is a language for navigating XML documents to find
elements or attributes. XQuery includes support for XPath.

It's important to note that XQuery (and XPath) expressions are
case-sensitive. For example, XQuery that references the XML element "zip"
will not apply to XML elements named "ZIP" or "Zip." Case sensitivity is
sometimes difficult for SQL programmers to remember, as SQL query syntax
permits them to use "zip," "ZIP," and "Zip" to refer to the same column
name.

DB2 9 features more than 15 SQL/XML functions that enable you to search
for specific data within XML documents, convert relational data into XML,
convert XML data into relational data, and perform other useful tasks.
This article does not cover the full breadth of SQL/XML. However, it
reviews several common query challenges and how key SQL/XML functions can
address these challenges.

"Restricting"
results based on XML element values

SQL programmers often write queries that restrict the rows returned from
the DBMS based on some condition. For example, the SQL query in Listing 5 restricts the rows retrieved from the
"clients" table to include only those customers with a "Gold" status. In
this case, the customer's status is captured in an SQL VARCHAR column. But
what if you want to restrict your search based on some condition that
applies to data in an XML column? The XMLExists
function of SQL/XML provides one means to do this.

XMLExists enables you to navigate to an element
within your XML document and test for a specific condition. When specified
as part of the WHERE clause,
XMLExists restricts the returned results to
only those rows that contain an XML document with the specific XML element
value (in other words, where the specified value evaluates to "true").

Let's look at a sample query problem raised earlier. Imagine that you need
to locate the names of all clients who live in a specific zip code. As you
may recall, the "clients" table stores customers addresses (including zip
codes) in an XML column. (See Listing 3.) Using
XMLExists, you can search the XML column for
the target zip code and restrict the returned result set accordingly. The
following SQL/XML query returns the names of clients who live in zip code
95116:

Listing 7. Restricting results based on an XML element value

select name from clients
where xmlexists('$c/Client/Address[zip="95116"]'
passing clients.contactinfo as "c")

The first line is an SQL clause specifying that you only want to retrieve
information in the "name" column of the "clients" table. The
WHERE clause invokes the
XMLExists function, specifying an XPath
expression that prompts DB2 to navigate to the "zip" element and check for
a value of 95116. The "$c/Client/Address" clause indicates the path in the
XML document hierarchy where DB2 can locate the "zip" element. Using data
accessible from node "$c" (which we'll explain shortly), DB2 will navigate
through the "Client" element to its "Address" sub-element to inspect zip
code ("zip" values). The final line resolves the value of "$c": it's the
"contactinfo" column of the "clients" table. Thus, DB2 inspects the XML
data contained in the "contactinfo" column, navigates from the root
"Client" element to "Address" and then to "zip," and determines if the
customer lives in the target zip code. If so, the
XMLExists function evaluates to "true," and DB2
returns the name of the client associated with that row.

A common mistake involves formulating the
XMLExists query predicate, as shown in Listing 8.

Listing 8. Incorrect syntax for restricting results based on an XML element value

select name from clients
where xmlexists('$c/Client/Address/zip="95116" '
passing clients.contactinfo as "c")

While this query will execute successfully, it will not restrict the
results to clients living in zip code 95116. (This is due to the semantics
specified in the standard; it's not unique to DB2.) To restrict results to
clients living in zip code 95116, you need to use the syntax shown earlier
in Listing 7.

You may be curious how to include a query that restricts XML data in an
application. While this article does not discuss application development
topics in detail, it includes a simple Java
example that uses a parameter marker within an SQL/XML statement
to restrict output to information about customers who live in a given zip
code.

"Projecting" XML
element values

Now let's consider a slightly different situation, in which you want to
project XML values into your returned result set. In other words, we want
to retrieve one or more element values from our XML documents. There are
multiple ways to do this. Let's first use the
XMLQuery function to retrieve a value for one
element, and then use the XMLTable function to
retrieve values for multiple elements and map these into columns of an SQL
result set.

Let's consider how to solve a problem posed earlier: how to create a
report listing the email addresses of the Gold customers. The following
query in Listing 9 invokes the
XMLQuery function to accomplish this task:

Listing 9. Retrieving email information for qualifying customers

The first line specifies that you want to return values for the "email"
sub-element of the root "Client" element. The second and third lines
indicate where DB2 can find this information -- in the "contactinfo"
column of the "clients" table. The fourth line further qualifies your
query to indicate that you're only interested in email addresses of Gold
customers. This query will return a set of XML elements and values. For
example, if you had 500 Gold customers, each with one email address, your
output would be a one-column result set with 500 rows, as shown in Listing 10:

Listing 10. Sample output for previous query

If you have multiple email addresses for individual Gold customers, you
may want to instruct DB2 to return only the primary address (that is, the
first email address found in the customer's "contactinfo" document). You
can modify the XPath expression in the first line of your query to do so:

Listing 11. Retrieving the first email address for each qualifying customer

Finally, if you lack email addresses for some Gold customers, you may want
to write a query to exclude nulls from the result set. To do so, modify
the previous query by adding another predicate to the
WHERE clause to test for missing email
information. You're already familiar with the SQL/XML function that
enables you to do that -- it's XMLExists. Listing 12 shows how you can rewrite the
previous query to filter out any rows for Gold customers whose contact
information (stored as XML) lacks an email address:

Listing 12. Retrieving the first email address for each qualifying customer for whom we have at least
one email address

Now let's consider a slightly different situation, in which you need to
retrieve multiple XML element values. XMLTable
generates tabular output from data stored in XML columns and is quite
useful for providing programmers with a "relational" view of XML data.
Like XMLExists and
XMLQuery, the
XMLTable function causes DB2 to navigate
through the XML document hierarchy to locate the data of interest.
However, XMLTable also includes clauses to map
the target XML data into result set columns of SQL data types.

Consider the following query (Listing 13), which
projects columns from both relational data and XML data stored in the
"items" table. (See Listing 2 to review the
"items" table.) The comment IDs, customer IDs, and messages are stored in
XML documents in the "comments" column. The item names are stored in an
SQL VARCHAR column.

The first line specifies the columns to be included in your result set.
Columns surrounded by quotation marks and prefixed with the "t" variable
are based on XML element values, as the subsequent lines of the query
indicate. The second line invokes the XMLTable
function to specify the DB2 XML column containing the target data
("i.comments") and the path within the column's XML documents where the
elements of interest are located (within the "Comment" sub-element of the
root "Comments" element). The "columns" clause, spanning lines 3 to 5,
identifies the specific XML elements that will be mapped to output columns
in the SQL result set, specified on line 1. Part of this mapping involves
specifying the data types to which the XML element values will be
converted. In this example, all XML data is converted to traditional SQL
data types.

Figure 2 shows sample results from running this query.
As you can see, the output is a simple SQL result set. Note that the
column names have been folded into upper case -- a normal occurrence with
SQL.

Figure 2. Sample output
from query using the XMLTable function

If desired, you can use XMLTable to create
result sets that include XML columns as well. For example, the following
statement produces a result set similar to the previous one, except that
"Message" data is contained in an XML column rather than an SQL VARCHAR
column.

Creating
relational views of XML data

As you might imagine, SQL/XML functions can be used to define views. This
is particularly useful if you'd like to present your SQL application
programmers with a relational model of your native XML data.

Creating a relational view over data in an XML column isn't much more
complicated than projecting XML element values. You simply write an
SQL/XML SELECT statement that invokes the
XMLTable function and use this as a basis for
your view definition. The following example in Listing 15 creates a view based on information in XML and non-XML
columns of the "items" table. (It's similar to the query in Listing 13.)

Although it's easy to create relational views over XML column data, you
should consider their use carefully if you are not on V9.7. Prior to V9.7,
DB2 didn't use XML column indexes when queries were issued against such
views. Thus, if you indexed the ResponseRequested element and issued an
SQL query that restricted the results of the "mustrespond" column to a
certain value, DB2 would read all the XML documents and search for the
appropriate "ResponseRequested" value. Unless you have a small amount of
data, this would slow runtime performance. So, until you upgrade to V9.7,
when DB2 will use XML indexes on SQL predicates, be careful here.

Joining XML and
relational data

By now, you may be wondering about joining XML data with non-XML data
(relational data based on traditional SQL types, for example). DB2 enables
you to do this with a single SQL/XML statement. While there are different
ways to formulate such joins, depending on your database schema and
workload requirements, we'll cover one example here. And you may be
surprised to learn that you already know enough about SQL/XML to get the
job done.

Recall that the XML column in the "items" table contains a "CustomerID"
element. This can serve as a join key for the integer-based "id" column in
the "clients" table. So, if you want a report of the names and status of
clients who've commented on one or more of your products, you'd have to
join XML element values from one table with SQL integer values from
another. And one way to accomplish this is to use the
XMLExists function, as shown in Listing 16:

Listing 16. Joining XML and non-XML data

The first line identifies the SQL columns to be included in the query
result set and the source tables referenced in the query. The second line
includes your join clause. Here, XMLExists
determines if the "CustomerID" value in one target source is equal to a
value derived from another target source. The third line specifies these
sources: the first is the "comments" XML column in the "items" table, and
the second is the integer "id" column in the "clients" table. Thus, if
customers have commented on any item and information about this customer
is available in the "clients" table, the
XMLExists expression will evaluate to "true"
and the client's name and status information will be included in the
report.

Using "FLWOR"
expressions in SQL/XML

Although we've only discussed a few functions, SQL/XML provides many
powerful capabilities for querying XML data and integrating that data with
relational data. Indeed, you've already seen some examples of how to do
that, but we'll discuss a few more here.

Both the XMLExists and
XMLQuery functions enable you to incorporate
XQuery into SQL. Our previous examples show how to use these functions
with simple XPath expressions to navigate to a portion of an XML document
of interest. Now let's consider a simple example in which you include
XQuery in your SQL queries.

XQueries may contain some or all of the following clauses:
"for," "let,"
"where," "order by",
and "return." Collectively, they form
FLWOR (pronounced flower) expressions. SQL
programmers may find it convenient to incorporate XQueries into their
SELECT lists to extract (or project) fragments
of XML documents into their result sets. And while that's not the only way
the XMLQuery function can be used, it's the
scenario this article covers. (A later article in this series discusses
XQuery in greater depth.)

Let's imagine that you want to retrieve the names and primary email
addresses of your "Gold" customers. In some respects, this task is similar
to one we undertook earlier (see Listing 11),
when we explored how to project XML element values. Here, you pass XQuery
(with "for" and
"return" clauses) as input to the
XMLQuery function:

The first line specifies that customer names and output from the
XMLQuery function will be included in the
result set. The second line indicates that the first "email" sub-element
of the "Client" element is to be returned. The third line identifies the
source of our XML data -- the "contactinfo" column. Line 4 tells us that
this column is in the "clients" table. Finally, the fifth line indicates
that only "Gold" customers are of interest to us.

Because this example was so simple, you could write the same query here.
Instead, you could write the same query in a more compact manner, much as
you did previously:

Listing 18. Rewriting the previous query in a more compact manner

However, the return clause of XQuery enables
you to transform XML output as needed. For example, you can extract email
element values and publish these as HTML. The following query will produce
a result set in which the first email address of each Gold customer is
returned as an HTML paragraph.

Listing 19. Retrieving and transforming XML into HTML

The first line indicates that you're interested in the text representation
of the first email address of qualifying customers. The second line
specifies that this information is to be surrounded by HTML paragraph tags
before return. In particular, the curly brackets ( { } ) instruct DB2 to
evaluate the enclosed expression (in the case, "$e") rather than treat it
as a literal string. If you omit the curly brackets, DB2 would return a
result set containing "<p>$e</p>" for every qualifying
customer record.

Publishing
relational data as XML

Up until now, we've concentrated on ways to query, extract, or transform
data contained within a DB2 XML column. And, as you've seen, these
capabilities are all available through SQL/XML.

SQL/XML provides other handy features as well. Among these is the ability
to convert or publish relational data as XML. This article only covers
three SQL/XML functions in this regard:
XMLElement, XMLAgg,
and XMLForest.

XMLElement lets you convert data stored in
traditional SQL columns into XML fragments. That is, you can construct XML
elements (with or without XML attributes) from your base SQL data. The
following example nests its use of the
XMLElement function to create a series of item
elements, each of which contain sub-elements for the ID, brand name, and
stock keeping unit ("sku") values obtained from the "items" table:

Listing 21. Sample output from previous query

You can combine XMLElement with other SQL/XML
publishing functions to construct and group XML values together, nesting
them in hierarchies as desired. The example in Listing 22 uses XMLElement to create
customerList elements whose contents are grouped by values in the "status"
column. For each "customerList" record, the
XMLAgg function returns a sequence of customer
elements, each of which include sub-elements based on our "name" and
"status" columns. Furthermore, you see that customer element values are
ordered by customer name.

Listing 22. Aggregating and grouping data

select xmlelement(name "customerList",
xmlagg (xmlelement (name "customer",
xmlforest (name as "fullName", status as "status") )
order by name ) )
from clients
group by status

Let's assume our "clients" table contains three distinct "status" values:
"Gold," "Silver," and "Standard." Running the previous query will cause
DB2 to return three customerList elements, each of which may contain
multiple customer sub-elements that further contain name and status
information. Thus, the output will appear similar to:

Update and delete operations

Although the focus of this article is on searching and retrieving data
stored in XML columns using SQL, it's worth spending a few moments
considering two other common tasks: updating and deleting data in XML
columns.

DB2 9 enabled users to update and delete XML data using SQL and SQL/XML
statements. Indeed, because the initial draft of the XQuery standard did
not address these issues, DB2 users had to rely on SQL for these tasks.
However, the W3C was working on an XQuery Update Facility, which was
implemented in DB2 9.5. The addition of the XQuery Update Facility
(initially called TRANSFORM) greatly simplified updating attributes and
elements in an XML document, as well as established a standard for doing
so. The XQuery Update Facility is now in Candidate Recommendation Status.

Updating XML data

While DB2 9 enabled you to update an XML column with an SQL
UPDATE statement or through the use of a
system-supplied stored procedure
(DB2XMLFUNCTIONS.XMLUPDATE), with DB2 9.5, the
new XQuery Update Facility can be used. This allows updating, inserting,
deleting, and creating a new element or attribute within an existing XML
document without having to recreate an entire document. The Update
facility can also be used to modify multiple nodes in the same
transaction.

For example, if you want to issue an UPDATE
statement to change the e-mail address of a particular client's contact
information, you simply have to supply the new e-mail address.

Consider the following statement:

Listing 24. Sample UPDATE statement

The "copy $new",
"modify do replace of $new," and
"return $new" are required clauses of the
XQuery Update facilty. You can learn more regarding the exact syntax and
options in the Resources section below. We have
included both the site for the XQuery specification as well as a
developerWorks article giving more details on the XQuery Update Facility.

Deleting XML
data

Deleting rows that contain XML columns is a straightforward process. The
SQL DELETE statement enables you to identify
(or restrict) the rows you want to delete through a
WHERE clause. This clause may include simple
predicates to identify non-XML column values or SQL/XML functions to
identify XML element values contained within XML columns.

For example, here's how you can delete all customer information for
customer ID 3227:

Listing 25. Deleting data for a specific client

delete from clients
where id = 3227

Do you remember how to restrict SQL SELECT
statements to return only rows for customers living in zip code 95116? If
so, you can easily apply that knowledge to deleting rows that track those
customers. Here's how to do so using XMLExists:

Listing 26. Deleting data for clients within a specific zip code

delete from clients
where xmlexists('$c/Client/Address[zip="95116"]'
passing clients.contactinfo as "c");

Indexing

Finally, it's worth noting that you can create specialized XML indexes to
speed access to data stored in XML columns. Because this is an
introductory article and the sample data is small, this article does not
cover that topic here. However, in production environments, defining
appropriate indexes can be critical to achieving optimal performance. The
Resources section of this article can help
you learn more about new DB2 indexing technology.

Summary

This article covered a lot of ground, highlighting several key aspects of
SQL/XML and how you can use it to query data in XML columns. There's
certainly more you can do with SQL and SQL/XML functions than we've
discussed here. This article includes a simple
Java example that illustrates how you can use parameter markers
with SQL/XML to query data in XML columns. We'll discuss application
development issues in greater detail in a future article. However, the
next article will explore some interesting aspects of XQuery, a new query
language supported by DB2 9.

Acknowledgments

Thanks to George Lapis, Matthias Nicola, Sriram Padmanabhan, Gary
Robinson, Hardeep Singh, and Bert Van der Linden for their help with this
article.

Learn about DB2
Express-C, the no-charge version of DB2 Express Edition for the
community.

Get products and technologies

Build your next
development project with IBM trial software, available for download directly from
developerWorks.

Now you can use
DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the
community that offers the same core data features as DB2 Express Edtion
and provides a solid base to build and deploy applications.

The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.