Architectural vision for HTML/XHTML2/Forms Chartering

The discussion around the re-chartering of the HTML-related work was
extensive. In the interest of providing a convenient summary, this document
discusses the overall architectural vision behind the chartering of these
groups,and how they fit into the wider pattern of the Interaction Domain and
the overall Web Architecture.

The architectural directions along which the community is now moving are
the result of much input, and everyone involved in the new activity will have
to make some accommodation to the reality of the situation and the
requirements of others. There is a strong common component throughout this
work, a serious need on the part of users and web designers, and a
significant opportunity to improve this space for everyone.

XML-based Architecture and tag soup

W3C has in general assumed that XML is the correct way forward and that
implementations will fall into line as necessary over time. For the mobile
market, and for non-HTML client technologies like SMIL, SVG, MathML,
Timed-Text and so forth, this has indeed happened. For the desktop browser
market, however, tag soup markup has persisted much longer than we would have
expected or hoped. In consequence, the TAG issue
TagSoupIntegration-54: Tag soup integration has been opened to consider
whether the indefinite persistence of 'tag soup' HTML is consistent with a
sound architecture for the Web.

There are several ways to approach this situation, given that pretending
the situation does not exist is not acceptable:

Try to force users and implementers to greater adoption of the
existing XHTML 1.x. In essence, this was the strategy before. There are
several drawbacks, however:

since Appendix
C of XHTML 1.0 allows such content to be sent to legacy user
agents, users get no warning when their content is not well formed.
Malformed content therefore proliferates. User agents start to assume
that any XHTML 1.x is not well formed, or sniff it for guides such as
an XML declaration or a Strict doctype

since XHTML added no new features (XHTML 1.0) or one new feature
(Ruby, in XHTML 1.1) the incentive for users to move to the XML based
format is small. They get no reward for doing so, beyond the rather
theoretical satisfaction of creating well-formed content.

Create a new language, with a different media type, which is more
extensible, more accessible, has richer semantics, and so forth. Older
user agents which do not understand this format will not request it, and
will reject it. This was the strategy for XHTML 2.0.

Unfortunately this also has a drawback. While XHTML 2.0 has been
adopted for authoring (for example, in device independent authoring) and
in some corporate situations (where the XForms support is valuable and
the choice of client can be controlled) it has not been successful among
legacy browser vendors nor have new browser vendors emerged to promote
it. Thus, client-side use remains small and this is a barrier to entry.
This approach may well succeed in the longer term, but it does not seem
to have sufficient traction currently.

Create independent but related languages for different audiences.
This has a clear and obvious drawback relative to a single language, and
yet can be considered especially if XML forms a common parsing model.

It would have been possible (and there were some calls for this) for
the primarily desktop oriented, consumer oriented language to have
only a tag-soup serialization. However, that would certainly
have a negative and divisive effect on the Web architecture. Gratuitous
incompatibilities with XML should be strenuously avoided.

Instead, the charter calls for two equivalent serializations
to be developed by the HTML WG, corresponding to a single DOM (or
infoset, though tag soup cannot be considered to have an infoset
currently, while it can have a DOM). This ensures that decisions are not
made which would preclude an XML serialization. It allows the two
serializations to be inter-converted automatically. Having new language
features, there is an incentive for content authors to use it; and having
client-side implementations means that there is the possibility to really
use it.

Of these, W3C has chosen the third approach. If this new HTML-family
format is widely used, and if it can be reliably converted to XML if it is
not already serialized in that form (reliably meaning not only that
formatting is the same but the structure is the same, and the semantics are
not altered) then XML-based workflows can create and consume this content.
Meanwhile, enterprise-strength needs are met by XHTML2, which includes
XForms. The two formats are differentiated by deployment strategy and
expected field of use.

Interconversion between two serializations of a single DOM should be well
defined. Experience with, for example, HTML Tidy, and John Cowan's work on
TagSoup, demonstrates the feasibility (although, unlike the case with HTML
Tidy, the interconversion should not be seen as error correction).

As mobile clients cannot afford the luxury of multiple parsers, and given
that an XML parser is already required, it should be the case that content
which is expected to be viewed on (or to not exclude) a mobile device should
be authored using the XML serialization. Also, as soon as there is a need for
any extensibility, the XML serialization (with use of XML namespaces) gains
an immediate practical advantage.

Over time therefore the amount of content in this format should be
expected to increase and the percentage of it in the XML serialization to
increase.

This direction does not diminish the role of XML as the central
architecture for markup on the Web and elswhere. It is merely trying out more
creative, and hopefully more succesful, ways to reach the same goal -- by
building bridges rather than barriers -- by reducing the large step into a
set of separate steps which can be motivated independeently.

Integration with the XML ecosystem

The Compound Document Formats (CDF) WG, which has up to now worked on
compound documents by reference, has now started work on compound documents
by inclusion - real multi-namespace documents, where XML is clearly the only
way forward in this plan. This should also drive adoption (once more, on
mobile first and then later on the desktop).

The role of the XHTML 2 working group in creating an enterprise-strength,
extensible markup language and also in producing spin-off technologies which
are applicable to other XML grammars, will also be emphasized. In particular
the XHTML 2 WG will take part in the XML Coordination group as well as the
Hypertext Coordination group.

The issue of extensibility was raised by several commenters. Because XML
has namespaces, and namespaced attributes, there is a clear method for
creating compound documents with clearly identified extensions - from
components like MathML or SVG, to rich metadata. It is expected that the tag
soup form only be used where no extensions are present.