Sessions details

Norm will give an overview of the HTML/XML Task Force and the progress
of its work. The task force was created by the W3C Technical
Architecture Group to consider the question of the divergence of the
HTML and XML technologies. Norm will attempt to summarize the state of
the issues and their potential solutions as conceived by the task
force at the time of his talk.

One key gap in the integration of XML into the global Web infrastructure is
validation. DTD validation is supported natively to different extents by different
browsers, and some Web protocols, notably SOAP, explicitly rule it out. Support for more
recent schema languages is virtually non-existent. With the growth of interest in rich
client-based applications in general, and the XRX methodology in particular, with its
emphasis on XML as the cornerstone of the client-server interaction architecture, this gap
has become more significant and its negative impact more troublesome.

In this paper we present a prototype Javascript-based client-side W3C XML Schema
validator, together with an API supporting online interrogation of validated documents. We
see this as enabling an important improvement in XML-based client-side applications,
extending as it does the existing datatype-only validation provided by XForms to structure
validation and supporting the development of generic schema-constrained client-side
editors.

Designing forms with XForms can benefit from JSON support. This is possible when
defining how to map any JSON object into an XML document. The proposed conversion is
specifically specified to allow an intuitive use of XPath. This is demonstrated with the
integration of an external JSON API in an XForms page.

What would happen if you could put a facade around MarkLogic Server to have it act as
a JSON store? This talk explores our discoveries doing just that.

A new open source project, MLJSON, provides a set of libraries and REST endpoints to
enable the MarkLogic Server to become an advanced JSON store. Internally the JSON is
represented as XML, and the JSON-centric queries are resolved using XML-centric indexes.
In this talk we'll present the design of the project, discuss its pros and cons, and talk
about the interesting uses for a fully-queryable, highly-scalable JSON store.

Akara (akara.info) is a platform for developing data services, and especially XML data
services, available on the Web, using REST architecture. It is open source software
written in Python and C. An important concept in Akara is information pipelining, where
discrete services can be combined and chained together, including services hosted
remotely. There is strong support for pipeline stages for XML processing, as Akara
includes a port of the well-known 4Suite and Amara XML processing components for Python.
The version of Amara in Akara provides optimized XML processing using common XML standards
as well as fresh ideas for expressing XML pattern processing, based on long experience in
standards-based XML applications. Some of these features include XPath and XSLT, a
lightweight, dynamic data binding mechanism, XML modeling and processing constraints by
example (using Examplotron), Schematron assertions, XPath-driven streamable processing and
overall low-level support for lazy iterator processing, and thus the map/reduce style.
Akara does not enforce a built-in mechanism for persistence of XML, but is designed to
complete a low-level persistence engine with overall characteristics of an XML
DBMS.

Akara, despite its deliberately low profile to date, has played a crucial role in
several marquee projects, including The Library of Congress's Recollection project and The
Reference Extract project, a collaboration of The MacArthur Foundation, OCLC, and
Zepheira. In Recollection Akara runs the data pipeline for user views, and is used to
process XML MODS files with catalog records. In RefExtract Akara processes information
about topics and related Web pages to provide measures of page credibility. Other users
include Cleveland Clinic, Elsevier and Sun Microsystems.

In our community there are three main models for representing and processing data:
Relations, XML and RDF. Each of these models has its “sweet spot” for applications and its
own query language; very few implementations cater for more than one of these. We describe
a uniform platform which provides interfaces for different query languages to query and
modify the same information or combine it with other data sources. This paper presents
methods for completely and correctly translating SQL and SPARQL into XQuery since XQuery
provides the most expressive foundation. Early results with our current prototype show
that the translation from SPARQL to XQuery already achieves very competitive performance
whereas there is still a significant performance gap compared to SQL.

This paper gives an overview of the open standards for the NETCONF protocol and
associated tools for configuration data modelling. NETCONF is an XML-based communication
protocol that allows for secure management of network devices from remote manager
applications. The YANG language for configuration data modelling is described and its
features compared to those offered by the existing XML schema langauges. Finally, the
standardized mapping of YANG data models to the DSDL schema languages (RELAX NG,
Schematron and DSRL) is discussed in some detail and the procedure for instance document
validation is outlined.

Sharon Adler was a Senior Manager at IBM Research in New York
specializing in XML standards, Web Services, and other areas for the
past eleven years. She recently relinquished her management role to a
long-time colleague so she focus her efforts on technical work. Before
rejoining IBM in 1999, she was a Director of Product Management for
Publishing Tools for Inso Corporation in Providence, Rhode. From
1985-1992, Sharon held key positions with IBM where she led the
development of standards-based authoring and document management tools.
Sharon has been instrumental in the development of International
computer standards for more than 30 years. She served as Vice Chair
/Editor of multiple ANSI/ISO standards committees as well as her
position as Chair of the XSLT Working Group from the W3C she
has held since its inception in 1997.

Michael Kay

The speaker, Dr Michael Kay, is founder of Saxonica Limited which develops the
popular Saxon XSLT, XQuery, and XML Schema engine. He is a member of the W3C working
groups for all three languages, and author of XSLT 2.0 Programmer's Reference, the
definitive Wrox guide to the language, recently republished in a fourth edition.

Some say that XML "on the web" (meaning "on the browser") has failed. For documents,
web servers generally deliver HTML+CSS, using a wide variety of server-side tool chains to
create it (often from XML). For AJAX-style data exchange, JSON has become the popular
choice.

But others think that XML has merely been waiting in the wings, and that after years
of waiting, we are finally starting to see the kind of technology on the browser platform
that is needed to make client-side XML processing a reality. The reason it has taken so
long for SVG to become established is that it took years of user pressure on browser
vendors to make it possible; similarly, it is only in the last year or two that client
side XForms and XSLT 1.0 have become technically feasible on a sufficient range of
installed browsers. Clearly, the argument goes, when the only tools available for
processing XML on the client were JavaScript and the DOM, then no-one would find that very
attractive; with better tools, the landscape will change.

In the last few months we have seen announcements from Saxonica about a client-side
XSLT 2.0 implementation (still under development) and from ETH Zurich about a client-side
XQuery implementation (available as an alpha preview). Both have been built by adapting
and cross-compiling server-side Java implementations into Javascript using Google's GWT
toolkit. Taken together with XSLTForms from agenceXML, which is written in XSLT 1.0, these
products demonstrate that however unpromising the browser platform might be as a
development environment, the obstacles can be overcome and the user community can create
the tools that it needs whether or not the browser vendors decide to cooperate. In the
long run this might turn out to be even more significant than the much-hyped move to
HTML5.

The speaker, Dr Michael Kay, is founder of Saxonica Limited which develops the
popular Saxon XSLT, XQuery, and XML Schema engine. He is a member of the W3C working
groups for all three languages, and author of XSLT 2.0 Programmer's Reference, the
definitive Wrox guide to the language, recently republished in a fourth edition.

The XMLHttpRequest (XHR) interface has been available various browsers as early as
1999. While the name is prefixed with "XML", this interface has been widely used to allow
browser-based applications to interact with a variety of web services and content--many of
which are now in other formats which includes JSON. At the time of origin of this
interface, the design pattern of building and unmarshalling whole XML documents into a DOM
probably made sense, but there are now many use cases where processing a document to build
a whole DOM representation is u ndesired if not infeasible.

This paper describes enhancements and new interfaces for the browser to enable
event-oriented parsing of XML. The enhancements enable the ability to bind XML efficiently
to local data structures and to process large amounts of XML content with very little
memory. The new interfaces provide a number of new possibilities including better
interoperability between XML and JSON.

Murata is a member of the IDPF EPUB WG, the coordinator
of its Enhanced Global Language Support Subgroup, and the
technical lead of an EPUB project funded by the Japanese
government. He gives an overview of EPUB3 and then
focuses on global language support and comic in EPUB3.

村田 真 (MURATA Makoto [FAMILY Given])

Murata has contributed to standardization, research, evangelization,
education, and practical applications of XML. In particular, he is
internationally recognized as an expert of XML schema languages. He has
participated in several standardization committees including the W3C XML WG and
OASIS RELAX NG technical committee. He is now the convenor of ISO/IEC
JTC1/SC34/WG4 (OOXML). He graduated from the Science Department of Kyoto
University, and hold a Ph.D. from Tsukuba University. Since 2008, he has been on
the board of directors of JSSST (Japan Society for Software Science and
Technology). He has received the survey paper award from JSSST in 2007, the
achievement award and the international standardization development award from
the Information Processing Society of Japan in 2006, and the best paper award of
Internet Conference in Japan in 1998.

The link between the Bible and publishing technology is at least as old as Gutenberg's
press. 400 years after the publication of the King James Bible, we were asked to convert
five modern French Bible translations from a widely-used ad hoc TROFF-like markup scheme
used to produce printed Bibles to a standard XML vocabulary, and then to EPUB. We opted to
use XSLT 2.0 and ant to perform all stages of the conversion process. Along the way we
discovered previously unimagined creativity in the original markup, even within a single
translation. We cursed the medieval scholars and the modern editors who have colluded to
produce several mutually incompatible document hierarchies. We struggled to map various
typesetting features to EPUB. E-Reader compatibility made us nostalgic for browser wars of
the 90s. The result is osisbyxsl, a soon-to-be open source solution for Bible EPUB
origination.

DITA, DocBook and TEI are among the most important frameworks for XML documents. While
the latest versions of DocBook and TEI use Relax NG as the schema language DITA is still
using DTDs. There were some fragile attempts to get DITA working with Relax NG but it
takes more than writing a Relax NG schema to have this working. DITA NG is an open source
project that aims to provide a fully functional framework for a Relax NG based
implementation of DITA.

DITA NG provides the Relax NG schemas for DITA 1.2 and also support for default
attribute values based on Relax NG a:defaultValue annotations - this is the critical part
that makes DITA work.

The presentation covers an overview of the Relax NG schemas, how DITA specializations
can be done using Relax NG (a lot simpler than with DTDs), the support for default
attribute values for Relax NG and includes a demo of the complete workflow of working with
DITA based on Relax NG.

We all know (and worry) about SQL injection, should we also worry about XQuery
injection?

With the power of extension functions and the implementation of XQuery update
features, the answer is clearly yes and I will start by showing how an attacker can send
information to an external site or erase a collection through XQuery injection on a naive
and unprotected application using the eXist REST API.

This was the bad news.

The good news is that it's quite easy to protect your application to XQuery injection
and after this word of warning, I'll discuss a number of simple techniques (literal string
escaping, wrapping values into elements or moving them out of queries in HTTP parameters)
to do so and show how to implement them in different environments covering traditional
programming languages, XSLT, XForms and pipeline languages.

This presentation will be fairly technical and practical, the goal being that after
the presentation, attendees are able to implement what has been presented.

Although the details and demonstrations will be based on Orbeon Forms and eXist, they
will be generic enough to be easily transposable to other environments.

Currently, there is growing perception that the Web and XML communities are drifting
apart. One general concern is that up-to-date XML technologies (such XSLT 2.0 or XQuery
1.0) are not seeing any support in the browsers, thus negating much of their potential.
Last year, at XML Prague 2010, the XQuery in the Browser plugin was presented as a
possible solution. It provides full XQuery support on the client side by embedding the
Zorba XQuery engine into a number of contemporary browsers. While the applications and
usability were convincing, using a (binary) plugin was seen as insurmountable obstacle to
a wider adoption, since even well-established plugins like Flash or Java are no longer
available on major platforms, e.g. on the growing number of mobile devices. Instead,
browser vendors have been investing significantly into the quality of their JavaScript
implementations, achieving impressive performance improvements. As a result, JavaScript
has become a viable platform for implementing XQuery. Since writing an XQuery engine from
scratch is a major effort, we opted for translating MXQuery, an existing Java-based
engine, using Google's Web Toolkit.

Even though our current version is still considered to be at an alpha stage, we were
able to deploy it successfully on most major desktop and mobile browsers. The size of the
JS code is about 700KB. By activating compression on the web server (reducing the
transfered data to less than 200 KB) as well caching on the client using the XQuery engine
does not causes noticable overhead after the initial loading.

In addition, we are already reaching a large level of completeness and compliance,
more than 95 percent correct tests at the 1.0.2 XQuery Test Suite. We have not yet done
formal testing on Update and Full text, but plan to do so in the near future.

One of the big challenges for any emerging database product is the maturity of its
query optimizer. This is even more of a problem with XQuery, which unlike SQL hasn't yet
had the benefit of forty years of optimization research. Any efforts to advance the state
of the art in optimizing XQuery are therefore important as steps towards fulfilling its
promise as a new database paradigm.

This paper introduces a novel meta language for efficiently specifying rewrites to
the expression tree of an XQuery program. The applications of this language are wide
ranging, including: use by XQuery implementers to efficiently communicate and execute
optimizations, use by XQuery library writers to extend the optimization semantics of the
host implementation with a deep understanding of the library functionality, and use by
XQuery end-users to provide optimization hints and greater insight into the program
written or data operated on.

This paper also discusses the use of this language to replace and extend the
optimization layer in XQilla, an open source implementation of XQuery.