5.4. Drivers for Non-XML Sources

The filter example used a file containing
an XML document as an input source. This example shows just one of
many ways to use SAX. Another popular use is to read data from a
driver, which is a program that generates a stream of data from a
non-XML source, such as a database. A SAX driver converts the data
stream into a sequence of SAX events that we can process the way we
did previously. What makes this so cool is that we can use the same
code regardless of where the data came from. The SAX event stream
abstracts the data and markup so we don't have to
worry about it. Changing the program to work with files or other
drivers would be trivial.

To see a driver in action, we will write a program that uses Ilya
Sterin's module
XML::SAXDriver::Excel to convert Microsoft Excel
spreadsheets into XML documents. This example shows how a data stream
can be processed in a pipeline fashion to ultimately arrive in the
form we want it. A Spreadsheet::ParseExcel object
reads the file and generates a generic data stream, which an
XML::SAXDriver::Excel object translates into a SAX
event stream. This stream is then output as XML by our program.

Here's a test Excel spreadsheet, represented as a
table:

A

B

1

baseballs

55

2

tennisballs

33

3

pingpong balls

12

4

footballs

77

The SAX driver will create new elements for us, giving us the names
in the form of arguments to handler method calls. We will just print
them out as they come and see how the driver structures the document.
Example 5-6 is a simple program that does this.

As you can see, the handler methods look very similar to those used
in the previous SAX example. All that has changed is what we do with
the arguments. Now let's see what the output looks
like when we run it on the test file:

The driver did most of the work in creating elements and formatting
the data. All we did was output the packages it gave us in the form
of method calls. It wrapped the whole document in
<records>, making our use of
<doc> superfluous. (In the next revision of
the code, we'll make the start_document(
) and end_document( ) methods output
nothing.) Each row of the spreadsheet is encapsulated in a
<record> element. Finally, the two columns
are differentiated with <column1> and
<column2> labels. All in all, not a bad job.

You can see that with a minimal amount of effort on our part, we have
harnessed the power of SAX to do some complex work converting from
one format to another. The driver actually automates the conversion,
but it gives us enough flexibility in interpreting the events so that
we can reject bad data (the empty row, for example) or rename
elements. We can even perform complex processing, such as adding up
values or sorting rows.