How should XML-DEV proceed with developing SAXPath? Are there lessons
already learned that should be pointed out? Let us know what your thoughts
are.

While the XML-DEV storms of the last few weeks show little sign of
abating, some developers have been discussing the potential for an
XPath API.

Back to Basics

Over the last few weeks the XML-Deviant has reported on a
number of controversies surrounding the recent activities of the W3C,
and a rise in the complexity and interdependence between
specifications forming the "XML family".

In the "SML" debates a year ago, I had the sense that lots
of hard-core XML geeks really understood the big picture, and the
"simpletons" were only arguing that it shouldn't be *necessary* for
everyone to understand the more obscure bits of XML to move forward
with it. Now I sense serious frustration as another year has gone
by without a Schema Recommendation, widespread uncertainty,
confusion, and contention about what "post-schema validation info
sets" mean even among the geekiest of the XML geeks, and no public
complaint as new "standards" efforts splinter off from the W3C
(e.g., TREX and JDOM). In other words, I'm getting the sinking
feeling that it's no longer *possible* to understand the big
picture.

I think there's widespread agreement as to the broad
outlines of a solution -- refactorization, simplification,
modularization... My all time favorite XML- DEV post came
from Tim Bray in early January...: "the lesson the Web teaches,
reinforced by XML, is that the way forward lies in Daring To Do
Less".

Whether XML will "fork" is unclear. That it's been over-hyped is
probably true, but it is certainly no less useful because of
that. That it's already had a great deal of success is without
doubt. As Champion notes, the solution relies on breaking the whole
down into manageable chunks. One way to achieve this is to produce
useful tools that are interchangeable and modular, even if that
modularity isn't reflected in the core specifications to the
satisfaction of all.

SAXPath

XSLT, XQuery, XPointer, and XLink all rely on XPath. Now
XPath is a perfect match for XSLT, but IMO it's much more powerful
than what is needed for the others. The extra power has a price: it
makes it harder to implement the specs built on top of it.
Sometimes implementors can leverage existing work, but not always --
for example, you can't just take Saxon's XPath implementation (which
is one of the best) and plug it into Kweelt (one of the more
promising XQuery-like implementations); the internal data structures
are far too different.

XPath isn't XML, so none of the XML tools work on it. That
said, it appears that parsing and interpreting XPath expressions is
becoming more and more important in its own right.

Proposal: let's give XPath the SAX treatment.

By the "SAX treatment" Reitzel means the development of the SAX API as a joint effort of
the XML-DEV community. His proposal was well-received. Thomas Passin
believed that the effort should be extended to include XPointer,
anticipating a future similar need for that specification. Passin
suggested that some
requirements engineering would be needed to determine the benefits
that such an API might bring.

... We need some requirements engineering here.
Especially, what would the API be used for? SAX lets us use a
parser without having to know how to talk to each different one.
What do we want to use our XPath/XPointer API for?

Here are some general kinds of things:

Parse and process the XPointer syntax. This would be useful for
developers to create XPointer applications and toolkits.

Return node-sets. This is more like a query capability, and would be
more useful for application writers.

Construct XPointer expressions based on some existing tree
(fragment).

Construct XPointer expressions based on a schema (fragment?)

These are quite different, and it might not be feasible to accomplish all of
them. We need to work out what would be valuable. Let's remember the 80-20
proposition!

That sounds like a good idea. The DOM working group has
been struggling with this for awhile, though, and you should at
least be aware of the problems. Not being hindered by the various
non-technical constraints that the W3C faces/imposes, perhaps we can
do better here.

I'd actually recommend giving XPath the DOM treatment.
Well, not really DOM, but maybe a cleaner, in-memory tree. XPaths
(even hairy ones) are extremely small, and the same path object is
likely to be reused many times, so I see no need to force the pain
of an event-based interface on users (unless someone thinks we're
going to be seeing gigabyte-long XPath expressions).

In theory, you could generate an XML equivalent to the
XPath expression and parse that. Question, how do you generate the
XML? I think you'll need to parse the XPath first. Better to
define an internal representation of the XPath expression and
parse/emit any supported syntax. If that syntax uses XML, then SAX
is a great implementation strategy, but not a useful API for XPath
expressions by itself.

I think both can be useful. What I've got in mind is the
building of custom objects based on the content of an XPath
expression. I've been down the "XPath OM" path before by hacking an
interface onto Matt Sergeant's XML::XPath module. It can be very
useful, but not as useful in some contexts as a builder callback
style interface. Converting an object model into another is often
harder than simply handling builder events. That's why CSS has SAC
and DOM2 ...
interfaces. Both are useful, but anyone wishing, say, to build a
custom selector object to get elements out of his own type of tree
will probably use SAC.

It would be good to maintain the early interest in SAXPath in order
to formulate a suitable solution to these issues. There is likely to
be lots of prior art that can be mined for additional ideas.
Implementations of XPath can be found in many open source XSLT
engines, and thus it's likely that if an API can be agreed upon,
implementations would follow very quickly.