Parsing an XML Document with XPath

The getter methods in the org.w3c.dom package API are commonly used to parse an XML document. But J2SE 5.0 also provides the javax.xml.xpath package to parse an XML document with the XML Path Language (XPath) . The JDOM org.jdom.xpath.XPath class also has methods to select XML document node(s) with an XPath expression, which consists of a location path of an XML document node or a list of nodes.

Parsing an XML document with an XPath expression is more efficient than the getter methods, because with XPath expressions, an Element node may be selected without iterating over a node list. Node lists retrieved with the getter methods have to be iterated over to retrieve the value of element nodes. For example, the second article node in the journal node in the example XML document in this tutorial (listed in the Overview section below) may be retrieved with the XPath expression:

In the code snippet, xPath is an javax.xml.xpath.XPath class object, and inputSource is an InputSource object for an XML document. With the org.w3c.dom package getter methods, the second article node in the journal node is retrieved with the code snippet:

Also, with an XPath expression, an Attribute node may be selected directly, in comparison to the getter methods, in which an Element node is required to be evaluated before an Attribute node is evaluated. For example, the value of the level attribute for the article node with the dateJanuary-2004 is retrieved with an XPath expression:

In this tutorial, an example XML document is parsed with J2SE 5.0's XPath class and JDOM's XPath class. XML document nodes are selected with XPath expressions. Depending on the XPath expression evaluated, the nodes selected are either org.w3c.dom.Element nodes or org.w3c.dom.Attribute nodes. The example XML document, catalog.xml, is listed below:

The example XML document has a namespace declaration, xmlns:journal="http://www.w3.org/2001/XMLSchema-instance", for elements in the journal prefix namespace.

This article is structured into the following sections:

Preliminary Setup

Parsing with the JDK 5.0 XPath Class

Parsing with the JDOM XPath Class

Preliminary Setup

To use J2SE 5.0's XPath support, the javax.xml.xpath package needs to be in the CLASSPATH. Install the new version of the J2SE 5.0 SDK. To parse an XML document with the JDK 5.0 XPath class, add the <JDK5.0>\jre\lib\rt.jar file to the CLASSPATH variable, if it's not already in the CLASSPATH. <JDK5.0> is the directory in which JDK 5.0 is installed.

The org.apache.xpath.NodeSet class is required in the CLASSPATH. Install Xalan-Java; extract xalan-j-current-bin.jar to a directory. Add <Xalan>/bin/xalan.jar to the CLASSPATH, where <Xalan> is the directory in which Xalan-Java is installed.

To parse an XML document with the JDOM XPath class, the JDOM API classes need to be in the CLASSPATH. Install JDOM; extract the jdom-b9.zip file to an installation directory. Add <JDOM>/jdom-b9/build/jdom.jar, <JDOM>/jdom-b9/lib/saxpath.jar, <JDOM>/jdom-b9/lib/jaxen-core.jar, <JDOM>/jdom-b9/lib/jaxen-jdom.jar, and <JDOM>/jdom-b9/lib/xerces.jar to the CLASSPATH variable, where <JDOM> is the directory in which JDOM is installed.

Parsing with the JDK 5.0 XPath Class

The javax.xml.xpath package in J2SE 5.0 has classes and interfaces to parse an XML document with XPath. Some of the classes and interfaces in JDK 5.0 are listed in the following table:

Class/Interface

Description

XPath (interface)

Provides access to the XPath evaluation environment. Provides the evaluate methods to evaluate XPath expressions in an XML document.

XPathExpression (interface)

Provides the evaluate methods to evaluate compiled XPath expressions in an XML document.

XpathFactory (class)

Used to create an XPath object.

In this section, the example XML document is evaluated with the javax.xml.xpath.XPath class. First, import the javax.xml.xpath package.

import javax.xml.xpath.*;

The evaluate methods in the XPath and XPathExpression interfaces are used to parse an XML document with XPath expressions. The XPathFactory class is used to create an XPath object. Create an XPathFactory object with the static newInstance method of the XPathFactory class.

XPathFactory factory=XPathFactory.newInstance();

Create an XPath object from the XPathFactory object with the newXPath method.

XPath xPath=factory.newXPath();

Create and compile an XPath expression with the compile method of the XPath object. As an example, select the title of the article with its date attribute set to January-2004. An attribute in an XPath expression is specified with an @ symbol. For further reference on XPath expressions, see the XPath specification for examples on creating an XPath expression.

Create an InputSource for the example XML document. An InputSource is a input class for an XML entity. The evaluate method of the XPathExpression interface evaluates either an InputSource or a node/node list of the types org.w3c.dom.Node, org.w3c.dom.NodeList, or org.w3c.dom.Document.

Evaluate the XPath expression with the InputSource of the example XML document to evaluate over.

String title =
xPathExpression.evaluate(inputSource);

The result of the XPath expression evaluation is the title: Design service-oriented architecture frameworks with J2EE technology. The XPath object may be directly evaluated to evaluate the value of an XPath expression in an XML document without first compiling an XPath expression. Create an InputSource.

inputSource =
new InputSource(new FileInputStream(xmlDocument)));

As an example, evaluate the value of the publisher node in the journal element.

The result of the XPath object evaluation is the attribute value: IBM developerWorks. The evaluate method in the XPath class may also be used to evaluate a node set. For example, select the node or set of nodes that correspond to the article element nodes in the XML document. Create the XPath expression that represents a node set.

String expression="/catalog/journal/article";

Select the node set of article element nodes in the example XML document with the evaluate method of the XPath object.

XpathConstants.NODESET specifies the return type of the evaluate method as a NodeSet. The return type may also be set to NODE, STRING, BOOLEAN or NUMBER. The NodeSet class implements the NodeList interface. To parse the nodes in the node set, cast the NodeSet object to NodeList.

NodeList nodeList=(NodeList)nodes;

Thus, nodes in an XML document get selected and evaluated without iterating over the getter methods of the org.w3c.dom API. The example program XPathEvaluator.java is used to parse an XML document with the JDK 5.0 XPath class.

Parsing with the JDOM XPath Class

The JDOM API XPath class supports XPath expression to select nodes from an XML document. Some of the methods in the JDOM XPath class are illustrated in the following table:

XPath Class Method

Description

selectSingleNode

Used to select a single node that matches an XPath expression.

selectNodes

Used to select a list of nodes that match an XPath expression.

addNamespace

Used to add a namespace to match an XPath expression with namespace prefixes.

In this section, the procedure to select nodes from the example XML document catalog.xml with the JDOM XPath class shall be discussed. The node/nodes selected by the select methods are modified, and the modified document is output to an XML document. First, import the JDOM org.jdom.xpath package classes.

xmlDocument is the java.io.File representation of the XML document catalog.xml. The static method selectSingleNode(java.lang.Object context, String XPathExpression) selects a single node specified by an XPath expression. If more than one nodes match the XPath expression, the first node that matches the XPath expression gets selected. Select the attribute node level of an element article in a journal with title set to Java Technology, and with article attribute date set to January-2004, with an XPath expression.

The title node with value Design service-oriented architecture frameworks with J2EE technology gets selected. Modify the title node.

titleNode.setText(
"Service Oriented Architecture Frameworks");

The static method selectNodes(java.lang.Object context, String XPathExpression) selects all of the nodes specified by an XPath expression. Select all of the article nodes for the journal with a title set to Java Technology.

The Java program JDomParser.java is used to select nodes from the catalog.xml XML document. In this section, the procedure to select nodes from an XML document with the JDOM XPath class select methods was explained. The nodes selected are modified. The modified document is output to a XML document with the XMLOutputter class. catalog-modified.xml is the output XML document.

Conclusion

In this tutorial, an XML document was parsed with XPath. XPath is used only to select nodes. XPath APIs discussed in this tutorial do not have the provision to set values for XML document nodes with XPath. To set values for nodes, the setter methods of the org.w3c.dom package are required.