Session details

@ 20

“20” has an interesting significance this year for XML, Syncro Soft, oXygen XML Editor, and me, so let’s talk about “20”!

Does the world need more XML standards?

Francis Cave (chair of ISO/IEC JTC 1/SC 34)

International Standards, such as SGML and DSSSL, may have passed into history, but the committee that was responsible for those foundational standards still finds it has work to do, and is on the lookout for further work to meet new requirements for structured document description and processing. Francis Cave, who took over as Chair of committee ISO/IEC JTC 1/SC 34 at the beginning of this year, will provide a brief outline of the committee’s legacy and current mission, and will review potentially fruitful avenues for innovation in document description that could eventually lead to new standards.

SML – A simpler and shorter representation of XML

Jean-François Larvoire (HPE)

This paper describes SML, a proposed simplified format that is strictly equivalent to XML, but shorter and easier to read. All XML files, however complicated, can be converted to SML and back with no change. (Verified on the libxml2 test suite.) A conversion script is available for testing. Work is under way for adding SML support to the libxml2 library.

XProc 3.0

Traditional “standards update” where you will learn what new is planned in the area of XML pipelines.

Can we create a real world rich Internet application using Saxon-JS?

Pieter Masereeuw

This paper describes how Saxon-JS techniques can be used in creating a rich internet application for dictionary lookup. The main purpose of the text is to demonstrate, by showing some examples, how easy it is to almost entirely avoid Javascript in favour of Saxon-JS’ implementation of XSLT – in the author’s humble opinion being the web’s best programming language. Apart from that, it will discuss some difficulties that were encountered while writing the application, plus their solutions – if any.

Implementing XForms using interactive XSLT 3.0

O’Neil Delpratt (Saxonica) and Debbie Lockett (Saxonica)

In this paper, we discuss our experiences in developing a new XForms implementation for browsers using ‘interactive’ XSLT 3.0. Our main focus is to describe the mechanics of the implementation – how we were able to implement XForms features (such as actions) using the interactive XSLT extensions available with Saxon-JS, to update form data in the (X)HTML page, and handle user input using event handling templates. We will also discuss how this XForms implementation can be used, namely by integrating it into the client-side XSLT of a web application, and the benefits that this can bring. As a motivation and use case we use the XForms implementation in our in-house license tool application (for managing and generating licenses).

Life, the Universe, and CSS Tests

Tony Graham (Antenna House, Inc.)

The W3C CSS Working Group maintains a CSS test suite already composed of more than 17,000 tests and growing constantly. Tracking the results of running such a large number of tests on a PDF formatter is more than anyone could or should want to do by hand. The system needs to track when a test’s result changes so that the changes can be verified and the test’s status updated. Finding differences is not the same as checking correctness. An in-house system for running the tests and tracking their results has been implemented as an eXist-db app. Is it a masterpiece of agile development, or an example of creeping featurism?

Form, and Content

Steven Pemberton (CWI)

Because of the legacy of paper-based forms, modern computer-based forms are often seen as static data-collection applications, with rows of rectangular boxes for collecting specific pieces of data. However, they have far more opportunities for being dynamic, checking data for consistency, leaving out fields for non-relevant data, and changing structure and detail to match the data-filling flow. Furthermore, data is no longer limited to pure textual input, but can be entered using any method that is available on a computer. While classically it is the form that drives the data produced, this paper examines how forms can be data-driven, for structure, for presentation, and for execution.

A short story about XML encoding and opening very large documents in an XML editing application

Radu Coravu (Syncro)

XML editing applications need to deal with various sizes of XML documents. We’ll discuss about XML encoding, Unicode, the ratio between file sizes on disk and the necessary amount of memory necessary in an application and how an XML editing application can handle opening and editing large and huge XML documents (Gigabytes) without overflowing its memory space.

XML periodic table, XML repository and XSLT checker

Johannes Kolbe (data2type GmbH) and Manuel Montero (data2type GmbH)

Three projects will be presented:
– XML periodic table: A SVG graphic with an illustrative overview of different XML standards.
They are listed with a short description and link to a web resource and categorized into usage groups. A small overview of software support is included as well as the standard family they come from, if provided.
– XML repository: A collection of XML Standards and their specifications in form of an online library.
Schemas display extracted parts and content as well as added metadata, graphical schemas and available samples. It will be avaiable for free and undergo further development and more standards will be added.
– XSLT checker: This tool checks XSLT stylesheets for unused or obsolete language elements.
It checks given data against stylesheets and outputs a summary of elements that aren’t used. Checking includes for-each, for-each-group, if, when, otherwise and template elements. Input are XML and XSL files, the output text file contains the stylesheets which are listed with the line, command, and position of the unused elements.

tokenized-to-tree – An XProc/XSLT Library For Patching Back Tokenization/Analysis Results Into Marked-up Text

Gerrit Imsieke (le-tex publishing services GmbH)

This talk will present an XProc/XSLT library for performing string-based tokenization and analysis (TA) on marked-up text, represented as XML and possibly deeply nested. Tokenization and analysis yield another XML representation of the input that overlaps with the original markup. The results need to be merged with the original markup. The task is made complicated because the input needs to be normalized prior to TA, for example by converting non-breaking and other typographic spaces to plain spaces, ignoring index entries, or processing footnotes separately. After the TA results have been merged into the normalized source XML, things that have been normalized away need to be restored.

The presented library provides a standard representation for the normalized input and its different types of placeholders. Three applications that build on this representation are presented: Linguistic TA of OOXML (MS Word) files, inserting line numbers from a PDF rendering into its TEI source, and linking occurrences of headwords in reference work entries to their primary entries.
The process of normalizing, character position counting, TA invocation, patching back the TA results, and inverting the normalization, is complex. It consists of multiple XSLT passes that need to be customized and assembled in a distinct way for each application. Encapsulating invariant core process steps as well as macroscopic, customizable steps and orchestrating the XML transformation steps in a sometimes non-linear way requires a technology that is good at these things. In this regard, the paper, and the open-source library that the presented applications are built on, is a demonstration of the utility of XProc in complex publishing pipelines.