Login

XML Parsing With SAX and Xerces (part 2)

The first part of this article demonstrated the basics of the Xerces XML parser,
explaining how it could be used to process XML documents in a non-Web environment.
This concluding section closes the circle, taking everything you’ve learned so
far and demonstrating how it can be applied to create dynamic Web pages from static
XML documents with Xerces.In the first part of this article, I introduced you to the Xerces XML parser,
explaining how it could be used to parse XML documents using an event-driven approach
called SAX. I also demonstrated how the parser worked by using it in a couple
of simple Java programs, and explained some of the interfaces and callbacks available
in the API.

Now, writing a Java program to parse an XML document is all well and good. However,
it’s not really all that useful if you’re a Web developer and your primary goal
is the dynamic generation of Web pages from an XML file. And so, this concluding
part takes everything you learned last time and tosses it out into the wild and
wacky world of the Web, demonstrating clearly how Java, JSP, Xerces and XML can
be combined to create simple, real-world Web applications. Take a look!{mospagebreak
title=The Write Stuff} As in the first part of this article, we’ll begin with
something simple.

Let’s go back to that XML file I created in the first part of this article:

I don’t want to get into the details of the callbacks here – refer to the explanation
for the original example if there’s something that doesn’t seem to make sense
– but I will point out some items of interest.

The most important difference between this example and the previous one is the
introduction of a new Writer object, which makes it possible to stream output
to the browser instead of the standard output device.

private Writer out;

The constructor also needs to be modified to accept two parameters: the name
of the XML file, and a reference to the Writer object.

As you can see, the callback functions used here have evolved substantially from
the previous examples – they now contain more conditional tests, and better error
handling capabilities. Let’s take a closer look.

Most of the work in this script is done by the startElement() callback function.
This function prints specific HTML output depending on the element encountered
by the parser.

This function maps different XML elements to appropriate HTML markup. As you
can see, the document element “inventory”, which marks the start of the XML document,
is used to create the skeleton and first row of an HTML table, while the different
“item” elements correspond to rows within this table. The details of each item
– name, supplier, quantity et al – are formatted as cells within each row of the
table.

Next, the characters() callback function handles formatting of the content embedded
within the elements.

For most of the elements, I’m simply displaying the content as is. The only deviation
from this standard policy occurs with the “quantity” element, which has an additional
“alert” attribute. This “alert” attribute specifies the minimum number of units
that should be in stock of the corresponding item; if the quantity drops below
this minimum level, an alert should be generated. Consequently, the characters()
callback includes some code to test the current quantity against the minimum quantity,
and highlight the data in red if the test fails.

And finally, to wrap things up, the endElement() callback closes the HTML tags
opened earlier.

{mospagebreak title=When Things Go Wrong} If you take a close look at the previous
example, you’ll notice some fairly complex error-handling built into it. It’s
instructive to examine that, and understand the reason for its inclusion.

You’ll remember that I defined a Writer object at the top of my program; this
Writer object provides a convenient way to output a character stream, either to
a file or elsewhere. However, if the object does not initialize correctly, there
is no way of communicating the error to the final JSP page.

The solution to the problem is simple: throw an exception. This exception can
be captured by the JSP page and resolved appropriately.

Let’s take another look at the startElement() callback, this time focusing on
the error-handling built into it:

By default, the startElement() callback is not set up to throw any exception.
However, it’s possible to alter this default behaviour and set it up to throw
a SAXException if an error occurs with the Writer object, and propagate this error
to the target JSP document.

Why is this necessary? Because if you don’t do this, and your Writer object throws
an error, there’s no way of letting the JSP document know what happened, simply
because the Writer object is the only available line of communication between
the Java class and the JSP document. It’s a little like that chicken-and-egg situation
we all know and love…

Now, in the JSP page, it’s possible to set up a basic error resolution mechanism
to display the error on the screen. In order to test-drive it, try removing one
of the opening “item” tags from the XML document used in this example and accessing
the JSP page again through your browser.{mospagebreak title=Skinning A Cat, Technique
Two} How about another example, this one utilizing a different technique to format
XML into HTML?

Here’s the XML file I plan to use – it’s a simple to-do list, with tasks, priorities
and due dates marked up in XML.

This is much cleaner and easier to read than the previous example, since it uses
Java’s HashMap object to store key-value pairs mapping HTML markup to XML markup.
Three HashMaps have been used here: StartElementHTML, which stores the HTML tags
for opening XML elements; EndElementHTML, which stores the HTML tags for closing
XML elements; and PriorityHTML, which stores the HTML tags for the “priority”
elements defined for each “item”.

A string variable named ElementName is also used to store the name of the element
currently being parsed; this is used within the characters() callback function.

private String ElementName = "";

Now, when an opening tag is found, the startElement() callback is triggered;
this callback function uses the current element name as a key into the HashMap
previously defined, retrieves the corresponding HTML markup for that element,
and prints it.

Note the numerous checks to avoid NullPointerExceptions, the bane of every Java
programmer on the planet.

With the opening element handled, the next step is to process the character data
that follows it. This is handled by the characters() callback, which performs
the important task of displaying the element content, with appropriate modification
to the font colour depending on the element priority.

Because I’ve used HashMaps to map XML elements to HTML markup, the code in the
example above is cleaner and easier to maintain. Further, this approach makes
it simpler to edit the XML-to-HTML mapping; if I need to add a new element to
the source XML document, I need only update the HashMaps in my class code, with
minimal modification to the callbacks themselves.{mospagebreak title=Endnote}
That’s about it for this article. Over the preceding pages, you learned more than
you ever wanted to know about the Xerces SAX parser, using it to develop simple
XML-based applications in both Web and non-Web environments. You (hopefully) understood
how SAX works, gained an insight into what callback functions do, and learned
how to use Xerces’ interfaces in combination with simple Java constructs to quickly
and easily create dynamic Web pages from static XML documents.

I hope you enjoyed it, and that it helped you to gain a greater understanding
of how to process XML and use it in a Java-based environment – both on and off
the Web. In case you’d like more information on the topic, you should consider
bookmarking the following sites:

Note: All examples in this article have been tested with JDK 1.3.0, Apache 1.3.11,
mod_jk 1.1.0, Xerces 1.4.4 and Tomcat 3.3. Examples are illustrative only, and
are not meant for a production environment. YMMV!