In this fourth step to XML mastery, Frank Coyle starts us into the world of parsing technology with a look at the major parsing models: DOM, SAX, and StAX (a newcomer on the block). With some parsing technology under your belt, you can programmatically extract, modify, and even create XML - and it's actually much less complicated than it sounds.

From the author of

From the author of

Now it’s time to move to step 4 in our series and look at options for
working with XML at a programming level. For a company like ZwiftBooks, building
a corporate infrastructure around XML implies being able to move XML code into
and out of programs seamlessly. This means extracting, modifying, and creating
XML by using an XML parser. In this article, we’ll look at how ZwiftBooks
can utilize XML parsing technology to integrate with an existing warehouse alert
program.

Event Versus Tree Parsing

Tree-based parsers such as the
Document Object Model
(DOM) build an XML tree and provide methods to navigate the tree.

Figure 1 illustrates the two major families of parsers for programmatically
working with XML. Both event-based parsers and tree-based parsers take an XML
document as input, but the two types of parsers treat that XML very
differently.

Event-Based APIs

An event-based API reports parsing events to your application through the use
of
callbacks.
As the XML streams into the parser, your handler is called as the parser
encounters events of interest—start of document, start of element, end of
element, and end of document (to name a few). Writing a SAX or StAX application
means writing handlers that react when an element or attribute of interest is
encountered in the XML.

Tree-Based APIs

A main tree-based API such as the W3C’s
DOM maps an XML
document into an internal tree structure, providing programmatic interfaces for
navigating that tree. Methods are available to determine child and parent
elements of nodes as well as to extract the content of elements of attributes.
With DOM, it’s also possible to modify the tree and thus create new
XML.

Choosing a Parser

The choice of event versus tree parser depends on the application
requirements:

Event-based parsers are good for extracting an element or attribute
from some XML and reacting to it in some way. Since event parsers look
at only one small part of an XML document at a time, you can parse very large
documents. Even documents in the
terabyte
range can be handled by a SAX or StAX parser.

Tree-based APIs build a navigable internal representation of a
document. This approach is useful for a wide range of applications, but
has a heavy impact on system resources—especially with large documents or
special data-modeling requirements. For example, building a DOM tree, mapping it
onto a new data structure, and discarding the original is typically not worth
the effort. However, if data context is important, DOM is the way to go.