How to XQuery Non-JDBC Sources from JDBC

By Yuli Vasiliev

Learn to use XQuery from within JDBC code to access data in non-JDBC sources.

Published March 2011

It has been possible to use XQuery, the query and programming language for manipulating XML data, from JDBC code since the release of Oracle Database 10g Release 2, when XQuery first appeared in Oracle XML DB. However, using the XQuery and XML DB features that first became available with Oracle Database 11g Release 2 -- like extension-expression pragmas and indexing unstructured XML content – can now significantly improve the efficiency of your JDBC code making XQuery queries.

This article provides some examples of using XQuery from within JDBC, accessing data in non-JDBC sources. It also covers the XQuery API for Java (XQJ; JSR 225), an alternative to JDBC that provides a standard Java API for interacting with an XQuery engine.

Before You Start

To follow the article examples, you’ll need access to an Oracle 11g Release 2 database. This release of Oracle Database comes with the JDBC drivers compatible with JDK 1.5. Moreover, the JDBC Thin driver that will be used in the examples, as well as the JDBC OCI driver, support JDK 1.6 (although it’s not required here). So, what you’ll need to do is to add the following jar files to the CLASSPATH environment variable:

ORACLE_HOME/jdbc/lib/ojdbc5.jar ORACLE_HOME/jlib/orai18n.jar

The next step is to check out whether the Java compiler and the Java interpreter are available in your system so that you’re sure that the code you’ll be working with can be compiled and run. It’s important to note that Oracle Database comes with both Java compiler and Java interpreter. So, make sure you have ORACLE_HOME/jdk/bin added to the path environment variable. Then, from an operating system prompt, you can issue the commands Javac and Java, with a version option to make sure you’re using the right software.

Note that in Oracle Database 11g Release 1 and later, you can configure fine-grained access control to external network resources using the access control list (ACL) feature. By default, Oracle Database denies access to an external network resource. Since the examples used here illustrate using XQuery queries accessing network resources through their URLs, you’ll have to create an ACL allowing connection to those resources by the database user you’re going to use. For example, if you’re going to use the HR demonstration schema, connecting to the resources located on the localhost, then you might issue the following PL/SQL block utilizing the CREATE_ACL and ASSIGN_ACL procedures of the DBMS_NETWORK_ACL_ADMIN supplied PL/SQL package:

Once the above code has been completed, the connect privilege for localhost is granted to HR – meaning you can now use this schema to access the localhost resources through the UTL_TCP, UTL_HTTP, UTL_SMTP, and UTL_MAIL PL/SQL packages. It's important to emphasize that XQuery uses UTL_HTTP behind the scenes when it comes to querying network resources through the HTTP protocol.

Diagrammatically, this might look like the figure below:

XQuery-ing Web Documents

The need to query a non-JDBC source like a Web document, using its URL, is not the only case when you might want to turn to XQuery, of course. You can consult the “Using XQuery with Oracle XML DB” chapter in the Oracle XML DB Developer's Guide to learn about all the exciting possibilities that are open to you with Oracle XQuery. Just to summarize here, you can use XQuery to efficiently query, construct, and transform any data that can be expressed in XML, including relational data. For example, you might issue an XQuery statement against a relational source to transform relational data into XML.

The examples in this article, however, will illustrate how you might take advantage of the new Oracle XQuery features when querying Web documents such as XML and HTML, which by no means represent JDBC-accessible data sources.

At this point you might be wondering: If such documents are not JDBC-accessible, how can I access them from within JDBC code? To exclude possible misunderstandings here, it’s important to understand that although the documents you’re XQuery-ing may not be JDBC-accessible, the XQuery engine being used resides within an Oracle Database, which indeed represents a JDBC-accessible source.

In other words, your JDBC code knows nothing about the source it's querying -- just passing the XQuery statement to the XQuery engine that evaluates that statement and access the source specified in it. The XQuery engine, in turn, is known to your JDBC code, since it’s an integrated part of the Oracle Database, which is JDBC-accessible indeed. So, in this context, the database can be thought of as an intermediary between your JDBC code and sources it accesses through XQuery.

Schematically, this architecture might look like this:

Whatever data source you query with Oracle XQuery from JDBC, you first have to obtain a JDBC connection to the Oracle database you want to use to process the XQuery statements being issued.

This is best understood by example. Suppose you want to query the following XML document accessible through http://localhost/bonuses.xml:

Multi-target XQuery Queries

The preceding example illustrated how you might use the XMLTable SQL function to decompose the results of an XQuery query into regular SQL rows, so that you can float them down with the next() method of a ResultSet instance. However, Oracle XQuery can be much more flexible than that.

For example, having got a virtual table generated by XMLTable, you can use it in a join query, linking its rows to the ones derived from regular relational tables. Here is an example:

As you can see in the above example, the virtual table generated by XMLTable participates in a join, along with the employees and departments relational tables from the HR demonstration schema. The join conditions are highlighted in bold. Examining the select list of this join query, however, you may notice that it doesn’t include any column of the employees table. That’s right: this table, in this particular example, acts as the intermediary between the rows generated by XMLTable and the departments’ rows. The output of this program should be the following:

Using Oracle XQuery Extension-Expression Pragmas

The examples discussed so far should have helped you to get a grasp on how XQuery can be used in JDBC. What they’ve not illustrated, however, are those new Oracle XQuery features, which were first introduced in Oracle Database 11g Release 2 and which might help you increase the efficiency of your JDBC code.

One of those features is extension-expression pragmas, which, among other things, provides the ability to treat an invalid XPath expression operating on XML schema-based data, as if its targeted node does not exist, not raising an invalid XPath exception and, therefore, making it unnecessary to catch this exception in JDBC code. The example discussed in this section will illustrate how you might take advantage of this new feature.

From the preceding example, you learned how you might intermix the rows generated as a result of an XQuery evaluation with the rows selected from relational tables, within a single SQL statement. In the following example, you’ll look at how you might intermix the rows generated by two different XQuery queries – each evaluated by a separate XMLTable. The target of the first XQuery query is the bonus.xml document used in the preceding examples, and the target of the second one is the XML schema-based XMLType table purchaseorder from the OE demonstration schema. So, the first thing you need to do is to grant the connect privilege for localhost to OE:

Unsurprisingly, because the /PurchaseOrder/Date node does not conform to the XML schema associated with the purchaseorder XML document and, therefore, may not exist in this document, the above code will end up with the following error:

To fail gracefully, you might of course, create an exception block in the program to catch such errors. However Oracle XQuery provides another solution to this problem, without JDBC code having to generate any exception. In particular, you can put pragma #ora:invalid_path empty # right before a questionable XPath expression. So, the updated query string would look like this:

The important thing to note here is that the Oracle XQuery extension-expression pragmas in general, and #ora:invalid_path empty # discussed in this example in particular, are not bound to a certain function – say, XMLTable – and can be used with the others functions performing XQuery expression evaluation. For example, the SELECT statement discussed here could be rewritten in a way that the second appearance of XMLTable (whose XQuery expression argument targets the purchaseorder document) is omitted in favor of a series of the XMLCast(XMLQuery(…)) combinations. However, you still would be able to use the pragma. This is how the implementation of the query’s ORDER BY clause based on XMLCast(XMLQuery(…)) might look like:

ORDER BY XMLCast(XMLQuery('(#ora:invalid_path empty #){$p/PurchaseOrder/Date}'
PASSING OBJECT_VALUE AS "p" RETURNING CONTENT) AS VARCHAR2(30))

Indexing

XMLIndex, first introduced in Oracle Database 11g Release 1 and enhanced in Release 2, now supports structured as well as unstructured components. Using XMLIndex with an unstructured component can be especially powerful for indexing Web documents, because in this case the domain of information that has little structure is very large. On the other hand, a structured component can be useful when you encounter document-centric data containing structured islands, organizing such islands in a relational format. For example, a Web document representing an article may contain metadata such as title, author, pubDate, which can be considered relational islands inside that document and, therefore, can be profitably organized in a relational format.

To see XMLIndex in action, let’s walk through a simple example. The first thing we will do is create a new database schema and grant all the required privileges to it:

As you can see, the otn_xml table created here contains an XMLType column, which you can index with XMLIndex. Before doing that, though, it would be interesting to look at the execution statistics of a query issued against unindexed XML data stored in the table, so that you’ll be able to compare it with the statistics generated for this same query when the data has been indexed. Thus, you first have to populate the otn_xml table with data. This can be done with the following JDBC code:

Since the above JDBC program accesses an Internet resource (the OTN - New Articles RSS document), you must be connected to the Internet during program execution. The program extracts the crucial information about the latest OTN articles from the RSS document and put it into the otn_xml table created as discussed earlier. You can safely run the program any number of times – it won’t save the information about the same article twice, not will it raise a primary key violation error. This is because the condition specified in the WHERE clause of the subquery in the INSERT statement allows only those rows that are not already in the otn_xml table.

Now that the table is populated, you can go ahead and query it. To play with queries, turn back to your SQL prompt tool and connect as usr/usr. Assuming you’re using SQL*Plus, set the AUTOTRACE variable on explain to view the execution statistics of the statements being issued, as follows:

SET AUTOTRACE ON EXPLAIN

Then, issue a query that targets a single record in the otn_xml table, searching for a certain article title being extracted from the XML data stored in the item XMLType column. Such a query might look like this:

Note that the above execution plan shows almost no improvement as compared with the one generated for this same query accessing the unindexed data, shown previously. The fact is that, by default, an XMLIndexis created with an unstructured component; structured component must be included explicitly. You can add a structured component to an existing XMLIndex by registering a new parameter representing that structured component and then altering the index, adding the registered parameter. In our example, though, let’s recreate the index so that it includes only a structured component:

TITLE -------------------------------------------------------------------------------- DESCRIPTION -------------------------------------------------------------------------------- Agile Enterprise Architecture Can enterprise architecture and agile development play in the same sandbox? Voices from the community weigh in in the Architect Community column in the Nov-Dec 2010 issue of Oracle Magazine.

Examining the above execution plan, you can probably guess that the XMLIndex you’ve got doesn’t help here. This is fairly predictable since that index doesn’t include an unstructured component at the moment, which can be particularly effective in such searches. So, let’s add an unstructured component to the index:

ALTER INDEX otn_idx PARAMETERS('PATH TABLE path_tab');

Then, rerun the query to check to see if there is any improvement. Here is the execution plan you should see now:

XQJ is a Java API that enables XQuery queries. XQJ allows you to execute XQuery queries, bind data to them, and then process their results. XQJ is similar to JDBC in many ways, but is used to process XQuery queries only. Oracle XML Developer's Kit (Oracle XDK) supports XQJ, providing you the packages to build XQJ programs. These packages can be found in the following Oracle XDK files: xdk\lib\xqjapi.jar, xdk\lib\xqjori.jar. So, these files must be included to the classpath when compiling a XQJ program. During execution, though, some more JARs may be necessary.

Consider the following example in which the XQJ API is used to connect to the OTN - New Articles RSS document (this same document was used in the preceding example) and then extract those article titles that include the ‘Oracle Database’ substring.

Compilation of the above program with the xdk\lib\xqjapi.jar, xdk\lib\xqjori.jar files included to the classpath still may end up with an error, since neither of these files contains the javax.xml.stream.XMLStreamReader class required when the compilation process comes to the XQPreparedExpression.bindDocument() method (at least as of the XDK coming with Oracle Database 11.2.0.1.0). Although the bindDocument() method used in this particular example employs an java.io.InputStream value as the second parameter, the absence of javax.xml.stream.XMLStreamReader makes the compiler generate an error. If the JDK version your database comes with is earlier than 6.0, you won’t find this class in the JDK jars either. The fact is that the XMLStreamReader class belongs to Streaming API for XML (StAX), which ships as part of Java Standard Edition 6 runtime, not part of earlier versions. So, you must have a copy of JDK 6 or at least JRE 6 on your system. You can download it from the Java SE Downloads page at http://www.oracle.com/technetwork/java/javase/downloads/index.html. You’ll find javax.xml.stream.XMLStreamReader in jre\lib\rt.jar.

At runtime, the above program must also have access to the following Oracle XDK files: xquery.jar and xmlparserv2.jar. Also, orai18n-collation.jar from the ORACLE_HOME\jlib directory must be included.

The output generated by the above program might look like this (depends on the actual content of the RSS page, of course):

It’s important to emphasize that the above example illustrated how you might access a Web document through its URL. In contrast, accessing an XML document stored in the local file system or Oracle XML DB Repository does not require you to declare that document as external and then use the XQPreparedExpression.bindDocument() method to bind it to the query. Instead, you can just use the fn:doc XQuery function within the query itself.

Thus, rewriting the preceding program to access the OTN - New Articles RSS document downloaded to the /home/myfiles directory on the local file system and saved as otntecharticle.xml, you might produce the following code:

Conclusion

As you learned in this article, the advantage of using XQuery in JDBC is that you get the ability to access a wide range of different sources, including those considered normally non-JDBC sources. Thus, having got just a URL, you can issue an XQuery query accessing a corresponding HTML, XML, RSS, or any other document on the Web.

Also, you learned that SQL\JDBC is not the only way to enable XQuery queries in Java. Alternatively, you might use XQJ, which makes it possible to issue XQuery queries directly, without an SQL wrapper.

Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open source development, Java technologies, business intelligence (BI), databases, and service-oriented architecture (SOA). He is the author of Oracle Business Intelligence: The Condensed Guide to Analysis and Reporting (Packt, 2010) as well as a series of other books on the Oracle technology.