Klacks parser

The Klacks parser provides an alternative parsing interface,
similar in concept to Java's Streaming API for
XML (StAX).

It implements a streaming, "pull-based" API. This is different
from SAX, which is a "push-based" model.

Klacks is implemented using the same code base as the SAX parser
and has the same parsing characteristics (validation, namespace
support, entity resolution) while offering a more flexible interface
than SAX.

Parsing incrementally using sources

Exact behaviour depends on input, which can
be one of the following types:

pathname -- a Common Lisp pathname.
Open the file specified by the pathname and create a source for
the resulting stream. See below for information on how to
close the stream.

stream -- a Common Lisp stream with element-type
(unsigned-byte 8). See below for information on how to
close the stream.

octets -- an (unsigned-byte 8) array.
The array is parsed directly, and interpreted according to the
encoding it specifies.

string/rod -- a rod (or string on
unicode-capable implementations).
Parses an XML document from the input string that has already
undergone external-format decoding.

Closing streams: Sources can refer to Lisp streams that
need to be closed after parsing. This includes a stream passed
explicitly as input, a stream created implicitly for the
pathname case, as well as any streams created
automatically for external parsed entities referred to by the
document.

All these stream get closed automatically if end of file is
reached normally. Use klacks:close-source or
klacks:with-open-source to ensure that the streams get
closed otherwise.

Buffering: By default, the Klacks parser performs buffering
of octets being read from the stream as an optimization. This can
result in unwanted blocking if the stream is a socket and the
parser tries to read more data than required to parse the current
event. Use :buffering nil to disable this optimization.

buffering -- Boolean, defaults to t. If
enabled, read data several kilobytes at time. If disabled,
read only single bytes at a time.

The following keyword arguments have the same meaning as
with the SAX parser, please refer to the documentation of parse-file for more information:

validate

dtd

root

entity-resolver

disallow-internal-subset

In addition, the following argument is for types of input
other than pathname:

pathname -- If specified, defines the base URI of the
document based on this pathname instance.

Advance the source forward to the next event and returns it
like peek would.

Function KLACKS:PEEK-VALUE (source) => value*

Like peek, but return only the values, not the key.

Function KLACKS:CONSUME (source) => key, value*

Return the same values peek would, and in addition
advance the source forward to the next event.

Function KLACKS:CURRENT-URI (source) => uri

Function KLACKS:CURRENT-LNAME (source) => string

Function KLACKS:CURRENT-QNAME (source) => string

If the current event is :start-element or :end-element, return the
corresponding value. Else, signal an error.

Function KLACKS:CURRENT-CHARACTERS (source) => string

If the current event is :characters, return the character data
value. Else, signal an error.

Function KLACKS:CURRENT-CDATA-SECTION-P (source) => boolean

If the current event is :characters, determine whether the data was
specified using a CDATA section in the source document. Else,
signal an error.

Function KLACKS:MAP-CURRENT-NAMESPACE-DECLARATIONS (fn source) => nil

For use only on :start-element and :end-element events, this
function report every namespace declaration on the current element.
On :start-element, these correspond to the xmlns attributes of the
start tag. On :end-element, the declarations of the corresponding
start tag are reported. No inherited namespaces are
included. fn is called only for each declaration with two
arguments, the prefix and uri.

Function KLACKS:MAP-ATTRIBUTES (fn source)

Call fn for each attribute of the current start tag in
turn, and pass the following values as arguments to the function:

namespace uri

local name

qualified name

attribute value

a boolean indicating whether the attribute was specified
explicitly in the source document, rather than defaulted from
a DTD

Only valid for :start-element.

Return a list of SAX attribute structures for the current start tag.
Only valid for :start-element.

Function KLACKS:CLOSE-SOURCE (source)

Close all streams referred to by source.

Macro KLACKS:WITH-OPEN-SOURCE ((var source) &body body)

Evaluate source to create a source object, bind it to
symbol var and evaluate body as an implicit progn.
Call klacks:close-source to close the source after
exiting body, whether normally or abnormally.

Convenience functions

Function KLACKS:FIND-EVENT (source key)

Read events from source and discard them until an event
of type key is found. Return values like peek, or
NIL if no such event was found.

Function KLACKS:FIND-ELEMENT (source &optional
lname uri)

Read events from source and discard them until an event
of type :start-element is found with matching local name and
namespace uri is found. If lname is nil, any
tag name matches. If uri is nil, any
namespace matches. Return values like peek or NIL if no
such event was found.

Condition KLACKS:KLACKS-ERROR (xml-parse-error)

The condition class signalled by expect.

Function KLACKS:EXPECT (source key &optional
value1 value2 value3)

Assert that the current event is equal to (key value1 value2
value3). (Ignore value arguments that are NIL.) If so,
return it as multiple values. Otherwise signal a
klacks-error.

Read all klacks events from the following :start-element to
its :end-element and send them as SAX events
to handler. When this function is called, the current
event must be :start-element, else an error is
signalled. With document-events (the default),
sax:start-document and sax:end-document events
are sent around the element.

Function KLACKS:SERIALIZE-SOURCE (source handler)

Read all klacks events from source and send them as SAX
events to the SAX handler.

Class KLACKS:TAPPING-SOURCE (source)

A klacks source that relays events from an upstream klacks source
unchanged, while also emitting them as SAX events to a
user-specified handler at the same time.

In this example, find-element is used to skip over the
uninteresting events until the opening child1 tag is
found. Then serialize-element is used to generate SAX
events for the following element, including its children, and an
xmls-compatible list structure is built from those
events. find-element skips over whitespace,
and find-event is used to parse up
to :end-document, ensuring that the source has been
closed.