In this section of the DB2 pureXML Cookbook, you'll learn how to decompose and shred XML documents one at a time and in bulk. You'll find examples of documents being shredded, learn about XML syntax parameters and discover how to shred documents from programs using the stored procedure XDBDECOMPXML.

This chapter is an excerpt from the new book DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server, by Matthias Nicola and Pav Kumar-Chatterjee. Published by IBM Press. isbn:0138150478 Copyright 2010 by International Business Machines Corporation. More information is available on the publisher's site as well as Safari Books Online.

11.3.4 Decomposing One XML Document at a Time After you have registered and enabled the annotated XML Schema you can decompose XML documents with the DECOMPOSE XML DOCUMENT command or with a built-in stored procedure. The DECOMPOSE XML DOCUMENT command is convenient to use in the DB2 Command Line Processor (CLP) while the stored procedure can be called from an application program or the CLP. The CLP command takes two parameters as input: the filename of the XML document that is to be shredded and the SQL identifier of the annotated schema, as in the following example:

The keyword VALIDATE is optional and indicates whether XML documents should be validated against the schema as part of the shredding process. While shredding, DB2 traverses both the XML document and the annotated schema and detects fundamental schema violations even if the VALIDATE keyword is not specified. For example, the shredding process fails with an error if a mandatory element is missing, even if this element is not being shredded and the VALIDATE keyword is omitted. Similarly, extraneous elements or data type violations also cause the decomposition to fail. The reason is that the shredding process walks through the annotated XML Schema and the instance document in lockstep and therefore detects many schema violations "for free" even if the XML parser does not perform validation.

To decompose XML documents from an application program, use the stored procedure XDBDECOMPXML. The parameters of this stored procedure are shown in Figure 11.12 and described in Table 11.6.

Figure 11.12 Syntax and parameters of the stored procedure XDBDECOMPXML

Table 11.6 Description of the Parameters of the Stored Procedure XDBDECOMPXML

Parameter

Description

rschema

The relational schema part of the two-part SQL identifier of the annotated XML Schema. For example, if the SQL identifier of the XML Schema is db2admin.cust2xsd, then you should pass the string 'db2admin' to this parameter. In DB2 for z/OS this value must be either 'SYSXSR' or NULL.

xmlschemaname

The second part of the two-part SQL identifier of the annotated XML Schema. If the SQL identifier of the XML Schema is db2admin.cust2xsd, then you pass the string 'cust2xsd' to this parameter. This value cannot be NULL.

xmldoc

In DB2 for Linux, UNIX, and Windows, this parameter is of type BLOB(1M) and takes the XML document to be decomposed. In DB2 for z/OS this parameter is of type CLOB AS LOCATOR. This parameter cannot be NULL.

documented

A string that the caller can use to identify the input XML document. The value provided will be substituted for any use of $DECOMP_DOCUMENTID specified in the db2-xdb:expression or db2-xdb:condition annotations.

validation

Possible values are: 0 (no validation) and 1 (validation is performed). This parameter does not exist in DB2 for z/OS.

reserved

Parameters reserved for future use. The values passed for these arguments must be NULL. These parameters do not exist in DB2 for z/OS.

A Java code snippet that calls the stored procedure using parameter markers is shown in Figure 11.13

Figure 11.13 Java code that invokes the stored procedure XDBDECOMPXML

While the input parameter for XML documents is of type CLOB AS LOCATOR in DB2 for z/OS, it is of type BLOB(1M) in DB2 for Linux, UNIX, and Windows. If you expect your XML documents to be larger than 1MB, use one of the stored procedures listed in Table 11.7. These stored procedures are all identical except for their name and the size of the input parameter xmldoc. When you call a stored procedure, DB2 allocates memory according to the declared size of the input parameters. For example, if all of your input documents are at most 10MB in size, the stored procedure XDBDECOMPXML10MB is a good choice to conserve memory.

For platform compatibility, DB2 for z/OS supports the procedure XDBDECOMPXML100MB with the same parameters as DB2 for Linux, UNIX, and Windows, including the parameter for validation.

11.3.5 Decomposing XML Documents in Bulk DB2 9.7 for Linux, UNIX, and Windows introduces a new stored procedure called XDB_DECOMP_XML_FROM_QUERY. It uses an annotated schema to decompose one or multiple XML documents selected from a column of type XML, BLOB, or VARCHAR FOR BIT DATA. The main difference to the procedure XDBDECOMPXML is that XDB_DECOMP_XML_FROM_QUERY takes an SQL query as a parameter and executes it to obtain the input documents from a DB2 table. For a large number of documents, a LOAD operation followed by a "bulk decomp" can be more efficient than shredding these documents with a separate stored procedure call for each document. Figure 11.14 shows the parameters of this stored procedure. The parameters commit_ Count and allow_access are similar to the corresponding parameters of DB2's IMPORT utility. The parameters total_docs, num_docs_decomposed, and result_report are output parameters that provide information about the outcome of the bulk shredding process. All parameters are explained in Table 11.8.

Figure 11.14 The stored procedure XDB_DECOMP_XML_FROM_QUERY

Table 11.8 Parameters for XDB_DECOMP_XML_FROM_QUERY

Parameter

Description

rschema

Same as for XDBDECOMPXML.

xmlschema

Same as xmlschemaname for XDBDECOMPXML

query

A query string of type CLOB(1GB), which cannot be NULL. The query must be an SQL or SQL/XML SELECT statement and must return two columns. The first column must contain a unique document identifier for each XML document in the second column of the result set. The second column contains the XML documents to be shredded and must be of type XML, BLOB, VARCHAR FOR BIT DATA, or LONG VARCHAR FOR BIT DATA

An integer value equal to or greater than 0. A value of 0 means the stored procedure does not perform any commits. A value of n means that a commit is performed after every n successful document decompositions.

allow_access

A value of 1 or 0. If the value is 0, then the stored procedure acquires an exclusive lock on all tables that are referenced in the annotated XML Schema. If the value is 1, then the stored procedure acquires a shared lock.

reserved, reserved2

These parameters are reserved for future use and must be NULL.

continue_on_error

Can be 1 or 0. A value of 0 means the procedure stops upon the first document that cannot be decomposed; for example, if the document does not match the XML Schema.

total_docs

An output parameter that indicates the total number of documents that the procedure tried to decompose.

num_docs_decomposed

An output parameter that indicates the number of documents that were successfully decomposed.

result_report

An output parameter of type BLOB(2GB). It contains an XML document that provides diagnostic information for each document that was not successfully decomposed. This report is not generated if all documents shredded successfully. The reason this is a BLOB field (rather than CLOB) is to avoid codepage conversion and potential truncation/data loss if the application code page is materially different from the database codepage.

Figure 11.15 shows an invocation of the XDB_DECOMP_XML_FROM_QUERY stored procedure in the CLP. This stored procedure call reads all XML documents from the info column of the customer table and shreds them with the annotated XML Schema db2admin.cust2xsd. The procedure commits every 25 documents and does not stop if a document cannot be shredded.

Figure 11.15 Calling the procedure SYSPROC.XDB_DECOMP_XML_FROM_QUERY

If you frequently perform bulk shredding in the CLP, use the command DECOMPOSE XML DOCUMENTS instead of the stored procedure. It is more convenient for command-line use and performs the same job as the stored procedure XDB_DECOMP_XML_FROM_QUERY. Figure 11.16 shows the syntax of the command. The various clauses and keywords of the command have the same meaning as the corresponding stored procedure parameters. For example, query is the SELECT statement that provides the input documents, and xml-schema-name is the two-part SQL identifier of the annotated XML Schema.

Figure 11.16 Syntax for the DECOMPOSE XML DOCUMENTS command

Figure 11.17 illustrates the execution of the DECOMPOSE XML DOCUMENTS command in the DB2 Command Line Processor.

Figure 11.17 Example of the DECOMPOSE XML DOCUMENTS command

If you don't specify a message-file then the error report is written to standard output. Figure 11.18 shows a sample error report. For each document that failed to shred, the error report shows the document identifier (xdb:documented). This identifier is obtained from the first column that is produced by the SQL statement in the DECOMPOSE XML DOCUMENTS command. The error report also contains the DB2 error message for each document that failed. Figure 11.18 reveals that document 1002 contains an unexpected XML attribute called status, and that document 1005 contains an element or attribute value abc that is invalid because the XML Schema expected to find a value of type xs:integer. If you need more detailed information on why a document is not valid for a given XML Schema, use the stored procedure XSR_GET_PARSING_ DIAGNOSTICS, which we discuss in section 17.6, Diagnosing Validation and Parsing Errors.

Figure 11.18 Sample error report from bulk decomp

11.4 SUMMARY

When you consider shredding XML documents into relational tables, remember that XML and relational data are based on fundamentally different data models. Relational tables are flat and unordered collections of rows with strictly typed columns, and each row in a table must have the same structure. One-to-many relationships are expressed by using multiple tables and join relationships between them. In contrast, XML documents tend to have a hierarchical and nested structure that can represent multiple one-to-many relationships in a single document. XML allows elements to be repeated any number of times, and XML Schemas can define hundreds or thousands of optional elements and attributes that may or may not exist in any given document. Due to these differences, shredding XML data to relational tables can be difficult, inefficient, and sometimes prohibitively complex.

If the structure of your XML data is of limited complexity such that it can easily be mapped to relational tables, and if your XML format is unlikely to change over time, then XML shredding can sometimes be useful to feed existing relational applications and reporting software.

DB2 offers two methods for shredding XML data. The first method uses SQL INSERT statements with the XMLTABLE function. One such INSERT statement is required for each target table and multiple statements can be combined in a stored procedure to avoid repetitive parsing of the same XML document. The shredding statements can include XQuery and SQL functions, joins to other tables, or references to DB2 sequences. These features allow for customization and a high degree of flexibility in the shredding process, but require manual coding. The second approach for shredding XML data uses annotations in an XML Schema to define the mapping from XML to relational tables and columns. IBM Data Studio Developer provides a visual interface to create this mapping conveniently with little or no manual coding.

0 comments

Register

Login

Forgot your password?

Your password has been sent to:

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy