Delving Deeper Into StAX

June 22, 2004

Introduction

In my previous article, Does StAX Stack Up?, I introduced StAX (Streaming API for XML), the new parsing API in the JAXP (Java API for XML Processing) family. We talked about how StAX fit in with its sister API, downloaded the reference implementation, and developed some simple examples of using the StAX cursor based API for reading and writing documents. In this article, we'll dive deeper into StAX and introduce the features of the more advanced event iterator API.

StAX API Overview

While StAX is an option to consider versus DOM (Document Object Model), SAX (Simple API for XML) and TrAX (Transformation API for XML), within StAX itself, there are options to consider. StAX has both a cursor API and an event iterator API. Each of these APIs has a reading and writing side. This is depicted in the following diagram:

Cursor API Recap

In the previous article, we built the equivalent of "Hello world!" examples using the read and write sides of the cursor API. Conceptually, the cursor-based API moves a virtual cursor over the XML document.

On the reading side, an instance of an XMLStreamReader is obtained from the XMLInputFactory. The iterator API exposes hasNext() and next() methods that are used to read through the document in a forward only manner. Accessor methods such as getText() are used to read the current event which may be an element, attribute, or data. Interrogator methods such as isStartElement() help parse elements, attributes, and data from the document. Here is a sample code snippet for reference:

On the writing side of the API, an interface is exposed to write elements, attributes, and data. An instance of an XMLStreamWriter is obtained from the XMLOutputFactory. Once again, a simple example is provided for reference:

Event Iterator API Overview

The event iterator API also has reading and writing sides of the API. As with the cursor API, instances of readers are obtained from the input and output factories. However, there is an additional factory, XMLEventFactory, used to manufacture events.

On the reading side, an iterator API exposes a hasNext() method for reading through the document. The nextEvent() method is used to get a handle on an event. There are various subclasses of events with their own accessor and interrogator methods.

Now we start to see some of the differences in the event iterator API.The cursor API would use next() to position the cursor at the next element, attribute, or data. With the event iterator API, we use nextEvent() to get a handle to the next event. A very handy feature is the peek() method that is used to determine what the next sequential event is.

We can determine the type of event by using the switch construct over the event type, or by interrogator methods. This is true of both the cursor and event API. However, the event API has asXxx() methods such as asStartElement() to type the proper event object without casting. In the followingcode, StartElement, Character, and EndElement events are parsed. Notice how the nextEvent instance variable obtained via peek() is used to determine whether or not the StartElement has characters.

The next section of code bears special interest. Here, we get an event reader for the passed input file. Then, we simply add it to the output stream. Merging XML documents has never been easier than this.

When to Use a Cursor or Event Iterator API

The cursor API is less verbose and less powerful than the event API. Presumably, it is more efficient at what it does and creates fewer temporary objects. Both the cursor and event iterator API are forward only API. However, the event iterator API provides a peek() feature to get the next event, as was demonstrated in SimpleXmlEventWriter. The event iterator API has many other capabilities that we didn't cover here, such as the ability to filter, buffer, persist, and compare events[i].

Sample Code

Summary

StAX has cursor and event based iterator APIs. There are both reading and writing sides of the API. We reviewed code snippets of all the API, and built classes for reading and writing documents using the event API. We discussed why there are two types of StAX API, and why you might use each. Next time, we'll discuss when it is appropriate to use SAX, DOM, TrAX, and StAX—the various API in the JAXP family. Until then, the rest is up to you!

About the Author

Jeff Ryan is an enterprise architect for Hartford Financial Services. He has twenty years experience designing, developing, and delivering automated solutions to business problems. His current focus is on Java, XML, and Service Oriented Architecture. He may be reached at jeffreyjryan@aol.com.