Oracle Blog

Gregory Murphy's Web Log

Trip Report: Java ONE, June 28 - July 1 2004

My goal for this year's Java ONE conference (my first) was to learn as much
as I could about the "state of the art" of XML processing in Java. Given
the pervasiveness of XML in the J2EE platform, this was not an easy task.
There were somtimes concurrent sessions that dealt with XML, and session
overload kept me away from the evening BOFs after Tuesday. That said, here
are some highlights.

On Monday, Jeff Suttor, Bhakti Mehta and Ramesh Mandava surveyed the tools
that make up JAXP 1.3
(JSR 206), Java's most recent API for XML processing. Their survey's
leitmotif was performance, and the star performer is XSLTC, Sun's XSL
transformation compiler. Bhakti reviewed performance tests that
demonstrated a ten-fold decrease in run time for transformations, when
ported from Xalan to XSLTC. With 1.3, XSLTC has been made the default
transformer, which means it's what the factory will give you unless you
know the URI for Xalan. This is an interesting decision, because it means
that even when an application loads a stylesheet at transformation time for
a single use, the factory will have to pre-compile it. The benefits are
analogous to those of just-in-time compiling for the JVM, and given that
HotSpot is now the default JRE, it makes sense that XLSTC would become the
default transformer. Compiled "translets" are thread-safe, and can be
cached for re-use.

Fans of alternative validation methodologies will be happy to hear that
Sun's Multi-Schema Validator has been bundled with JAXP 1.3, and is
available through the API. Grammars may be pre-parsed and cached for
repeated, thread-safe invocation. In addition to built-in support for XML
Schema and Relax NG, there is a ValidatorHandler interface that can be used
as the basis for a customized validator. Nota bene: there is no support for
DTDs (the entity model makes this awkward). One of MSV's many advantages is
that a compiled grammar can be used to validate a document in memory, and
not just at parse-time, which opens up a wide range of new applications.
For example, validation can be inserted in among series of transformations
or SAX filters via "assertions" during testing and debugging.

JAXP 1.3 includes for the first time a stand-alone implementation of XPath
1.0. XPath is the W3C langauge for describing parts of an XML document.
Expressions can be pre-computed for repeated evaluation, and applied
against any document that can be represented as an abstract object model.
There is built in support for DOM, but users can build their own. The
value of an XPath expression can be returned as any type supported by the
XPath data model, which is limited to Strings, Integers, and Nodes (more on
datatypes later).

Though there is no support for XQuery (XML Query) 1.0 in JAXP 1.3, there is
a separate project under way (JSR 225) to develop XQJ, the XQuery API for
Java. As the standard is still in candidate status, the API will likely
continue to evolve. Andrew Eisenberg (IBM) and Jim Melton (Oracle) gave an
overview of the standard and a preview of the new API in a talk Tuesday
Morning. XML Query is a functional, idempotent (side-effect free) language
for asking questions of collections of XML resources, whether these be
documents, relational databases, or object repositories; whether stored as
XML, or viewed as XML via some kind of middle-ware.

An XQuery expression always returns a sequence (think SQL cursor), made up
of zero or more atomic types or document nodes. As the language is
functional in design, expressions can also operate on sequences - or the
return values of other expressions. This allows for a much more flexible
processing model than that supported by XSLT.

XQuery 1.0 is a strongly typed language. Its type system is borrowed from
XML Schema. Future versions of XPath and XSLT will also make use of XML
Schema's type system. Norman Walsh gave an overview of XPath 2.0 and XSLT
2.0 on Thursday afternoon.

The new versions of the standards are currently in last call, and it is
hoped that they will become candidate recommendations by the end of this
year. But the wheels of the W3C turn slowly, so don't hold your breath.

Like XQuery 1.0, XPath 2.0 will operate on sequences of items. An item may
be a node, or it may be any atomic type supported by XML schema. Up to now,
operations like comparison have always taken their operands to be strings
or nodes, but with a full range of types available, comparison will be more
clearly defined. If types are not correctly cast, a comparison could result
in an error, and such errors must interrupt processing. This could cause
problems for users who wish to upgrade their 1.0 stylesheets to take
advantage of the 2.0 type system. To take advantage of the new type system,
XPath 2.0 adds operators like "eq", "lt" and "gt" for performing value
comparisons, and "<>" for performing node comparisons.

XSLT 2.0 will include XPath 2.0 and a host of new features. Regular
expressions will make it easier to replace characters with markup (such as
when one wants to preserve space in a transformation to HTML). Improvements
to the template priority mechanism will make it easier to specify fallback
templates. Direct access to the result tree will make it easier to perform
some complicated, multi-pass operations, such as sort and iterate.

All JAXP tools are based on two interfaces to an XML document: SAX, the
parsing event-handler interface, and DOM, the in-memory, random-access
document model. Chris Fry and Scott Ziegler, both from BEA, gave a
presentation about StAX, an XML "stream reader" interface, that defines an
alternate interface to an XML document. StAX is being developed as part of
the Java Community Process (JSR 173). The reader interface allows the
caller to control parsing, for example:

The nice thing about StAX is that it maps directly to the iterator pattern.
The caller can stop the parse at any time, skip ahead some number of
events, or switch context. Applications that bind XML to data models will
generally be much simpler to code if pull parsing is used.

Finally, I attended a preview of JDBC 4.0, which will include support for
all of the SQL 2003 standard. This includes the new XML datatype, and SQL
select operators for dynamically generating XML - albeit within the
constraints of a tabular result set. For example, the expression

I have misgivings about the addition of this type to SQL. Since there is no
obvious way to define a document context, and no way to simulate one within
the scope of an RDBMS cursor, handling namespaces gracefully will be
difficult. There is also no way to break out of the tabular result model.
My prediction is that in five years we will all be using XQuery to extract
relational data into a structured context, and the XML select functions
will be deprecated relics.