Two technologies introduced on Wednesday at XTech 2000 promise to make
developers' lives easier, giving them less work to do in processing and
managing XML. Paul Prescod's EasySAX provides a middle ground between the
Simple API for XML (SAX) and the Document Object Model (DOM) for Python.
Murata Makoto presented the REgular LAnguage description for
XML (RELAX), which is a simpler schema language being developed at
the Information Technology Research and Standardization Centre (INSTAC) in
Japan.

EasySAX

Paul Prescod of ISOGEN, whom David Megginson introduced as the "Python
evangelist for XML," described the work he's done on EasySAX, a Python
module.
EasySAX provides a middle ground between the development-intensive approach
of event handler-based parsing and the memory-intensive approach of tree-based
parsing. By offering developers events, context, and the ability to build
tree structures when appropriate, EasySAX lets developers choose their own
balance of processing tradeoffs.

Presenting a Zen quest for the "Pythonic" way to process XML, Prescod
reused parts from both the SAX and DOM APIs, while hiding their
complexity and resource demands. In fact, EasySAX merges some ideas from
SAX, DOM, XSLT, and DSSSL, providing a layer above the bare parser API.

Prescod sought Aristotle's golden mean in his quest for a Pythonic
processing API: making it simple, but not too simple to get the job done;
elegant but not cute; flexible but not at the cost of clarity; and dynamic
but maintainable.

Asking "Does SAX have the Python nature?", Prescod found much in SAX
usable for his Python approach: SAX's complexity will be acceptable if
hidden. SAX's good performance and standards conformance are Pythonic, and
"reinventing wheels is not Pythonic." In order to hide the complexity,
character handling, event dispatching, and context management are given more
Python-like and "friendlier" support in the EasySAX API.

The DOM presents more difficulties for the "Python nature," even apart
from Prescod's opinions on its overall design elegance. Tree models have
serious limitations because of their ability to consume enormous amounts of
memory rapidly, making them difficult to use with large documents. While
some of Python's tools, such as the Zope Object database (ZODB), can ease
those problems by moving large trees from memory to disk, that approach
opens up new performance problems. Nonetheless, avoiding the reinvention of
the wheel suggests the DOM has an important role to play.

EasySAX combines material from SAX and DOM with more borrowings from
XSLT, XPath, DSSSL, Omnimark, Balise, and others, to build an API that
dispatches nodes rather than events. These nodes have context, and content
handlers can take advantage of that context to limit their activation to
particular situations.

Because the appropriate amount of tree-building varies from application
to application, EasySAX lets developers choose how to process nodeswith
or without tree-building. A "mini-DOM" provides access to the tree structures
built during parsing. At the same time, the parent context is always
available during the parse, and can be used even without tree-building.
Namespaces can be registered before or during the parse, allowing Python
programmers to reference namespace URIs with prefixes, as is done in an XML
document.

EasySAX is almost complete, and should be released in the next few days
on Prescod's web site, though
documentation, tools for pruning tree structures, and a number of other
features are still in development.

RELAX

RELAX supplies a simple tool for creating grammars that describe
XML-based languages, providing a lightweight alternative to XML Schema
Structures. Although RELAX is built in large part on the theoretical
framework of "hedge grammars," it deliberately takes a lightweight approach to
document description. Like other schema proposals, it uses an XML document
syntax to describe document structures.

RELAX can be used in the same contexts as DTDs and XML Schemas, as a
document description language supporting validation and other processes
building on an expected set of document structures. An alpha version of
a Java-based tool for
converting RELAX to DTDs is available, as is a C++-based tool for
verifying that documents conform to RELAX descriptions. Another Java-based
tool in alpha can generate Java classes for processing documents based on a
RELAX description.

RELAX is also namespace-aware, and a RELAX Namespace version that
supports mixing modules describing multiple namespaces should appear by
June. The RELAX Core should appear this month, though the approval process
for standardization (through JIS and ISO) will take longer. Tutorials and
descriptions are currently available in Japanese and English (see links
below).

Murata took a calm view of RELAX's prospects for success, accepting that
"as of today, nobody knows," and that "users and developers will make the
final call." Conversions from DTDs to RELAX, and from RELAX to XML Schemas,
offer a middle ground that may make RELAX a useful tool for immediate
developmenteven in cases where developers ultimately
expect to migrate to XML
Schemas. In any case, RELAX brings a different perspective to solving the
schema definition problem.