4.2. Events and Handlers

Why
do we call it an
event stream and not an element stream or a
markup object stream? The fact that XML is hierarchical (elements
contain other elements) makes it impossible to package individual
elements and serve them up as tokens in the stream. In a well-formed
document, all elements are contained in one root element. A root
element that contains the whole document is not a stream. Thus, we
really can't expect a stream to give a complete
element in a token, unless it's an empty element.

Instead, XML streams are composed of events. An
event
is a signal that the state of the document (as we've
seen it so far in the stream) has changed. For example, when the
parser comes across the start tag for an element, it indicates that
another element was opened and the state of parsing has changed. An
end tag affects the state by closing the most recently opened
element. An XML processor can keep track of open elements in a stack
data structure, pushing newly opened elements and popping off closed
ones. At any given moment during parsing, the processor knows how
deep it is in the document by the size of the stack.

Though parsers support a variety of events, there is a lot of
overlap. For example, one parser may distinguish between a start tag
and an empty element, while another may not, but all will signal the
presence of that element. Let's look more closely at
how a parser might dole out tokens, as shown Example 4-1.

Apply a parser to the preceding example and it might generate this
list of events:

A document start (if this is the beginning of a document and not a
fragment)

A start tag for the <recipe> element

A start tag for the <name> element

The piece of text "peanut butter and jelly
sandwich"

An end tag for the <name> element

A comment with the text "add picture of sandwich
here"

A start tag for the <ingredients> element

A start tag for the <ingredient> element

The text "Gloppy"

A reference to the entity trade

The text "brand peanut butter"

An end tag for the <ingredient> element

. . . and so on, until the final event -- the end of the
document -- is reached.

Somewhere between chopping up a stream into tokens and processing the
tokens is a layer one might call a dispatcher. It branches the
processing depending on the type of token. The code that deals with a
particular token type is called a handler.
There could be a handler for start tags, another for character data,
and so on. It could be a compound if statement,
switching to a subroutine to handle each case. Or, it could be built
into the parser as a callback dispatcher, as is the case with
XML::Parser's stream mode. If you
register a set of subroutines, one to an event type, the parser calls
the appropriate one for each token as it's
generated. Which strategy you use depends on the parser.