XHTML: Three Namespaces or One?

"There are all sorts of honest grounds for disagreement on namespace handling in XHTML 1.0, and they are worth investing technical debate in." (Tim Bray)

The goal of XHTML is to redefine HTML 4.0 as three XML-compliant DTDs that are able to be modularized into subcomponents to create lightweight feature sets, extended feature sets, or customized "hybrid" document types. When the XHTML 1.0 specification reached Proposed Recommendation status in mid-August, it sparked what became a great debate on xml-dev.

At first, thread participants seemed almost more confused than anything else: did the strange and complex spec come out of nowhere? Now it seems that, instead of gradually phasing out HTML over the next few years (what certainly seemed like the original plan), the HTML WG decided to "reformulate" HTML into "modules" of like elements. The idea is generally sound except, as always, the devil appears to be in the details of implementation. Adding to this complexity are a few technical requirements that make XHTML a little harder on everybody.

A big question is how much work should be required to make an HTML document XML-compliant.
Does XHTML make it more confusing than it need be? XML and HTML actually work together very nicely.
XML parsers don't care about the names of HTML element tags, as long as they are well-formed, and HTML ignores anything it doesn't understand. It's a match made in heaven.
So why attempt to implement a strict conformance architecture for use with HTML documents? Few have ever needed validation before in HTMLcertainly not for the HTML that was being served up to browsers!

At first glance, XHTML's three namespaces may seem logical enoughthree namespaces for
the three different grammars that were derived from HTML 4.0's DTDs: strict, transitional and frameset. However, many argue that XHTML's different grammars aren't really all that different from each other, and, in fact, represent dialects of the basic HTML's grammar. XHTML should therefore use a single namespace. (A <p> is a <p> is a <p> in all three DTDs). The differences among the three grammars are minor, and the elements that occur in more than one DTD all behave identically from a semantic perspective.

According to the XHTML proposed recommendation, conforming XHTML documents not only require the use of namespaces, they also require the ability to be validated using an XML doctype element. XHTML 1.0 is supposed to be compliant with XML version 1.0, but the latter doesn't have a namespace requirement. There is presently no mechanism in XML 1.0 processor that is capable of validating using namespaces with DTDs.

What follows are some selections from xml-dev postings that indicate the intensity and the quality of the debate. First, Isogen's Paul Prescod summarized the situation as follows:

"XHTML has three grammars. This probably has more to do with politics than technology: "our mandate is to blah blah blah". That means that when we get namespace-aware validators there will be three schemas. Schemas are namespace-triggered: * not "version attribute" triggered, * not DOCTYPE triggered.
They are namespace triggered. (and for good reason...trying to bring attributes into it makes the validation model much more complicated) It follows that we need three namespaces to attach the three grammars to. Next year we will want to introduce parts of XHTML into our various other document types.

By that time we will hopefully have a schema language that allows us to validate "HTML islands." Great! But which grammar do we validate against? With only one namespace we either have no choice or we must use the most loose of the three. Now as a purist I don't think that it should be possible to use the XHTML namespace until the W3C describes what conformance means.

So I would vote to do away with the namespaces altogether until that time comes around. But given that this is not politically feasible, I think that the current status of three namespaces makes sense as a required hook for future schema-based validation. The alternative is to let "loose" become the defacto standard! Ack. Better that we should either (temporarily) banish namespaces, banish "loose" or make three namespaces so that we do not paint ourselves into a corner."
(Paul
Prescod)

The tempting question still remains: why not just amend the HTML 4.0 DTDs so they required well-formedness, or just tell people that if they want their HTML to be XML-compatible, to make sure their documents are well-formed (as many people have been saying for a while now).

David Megginson made a great posting to propose just such a minimalist "dream" XHTML document:

My ideal XHTML (if I were dictator) by David Megginson

I'd have to say that I'd prefer no XHTML spec to the last-call WD that we have now, but for me (and other implementors) a single, standard HTML Namespace is really the *only* point of an XHTML spec. I'd be happy just with an XHTML NOTE that looked something like this:

"This note defines an XHTML, an HTML vocabulary for XML. The XHTML vocabulary consists of all of the elements and attributes included in HTML 4.01 (all flavours), but using XML rather than SGML syntax."

"All XHTML element and attribute names belong to the Namespace "http://www.w3.org/Namespaces/HTML/". This Namespace URI is intended to be
persistent; the attribute "{http://www.w3.org/Namespace/HTML/}version" is reserved to distinguish future versions of XHTML when there are backwards-incompatibilities."

"In an XHTML document, or within XHTML markup embedded in another document type, any non-XHTML extensions must be clearly distinguished by being placed in a separate Namespace."

Is this underspecified? Well, yes. Is it better than the status quo? It sure is! It's also probably the most that we can sell to the
Web community at once, and probably the most that they can successfully implement in the next few years."

Something this simple has the potential to revolutionize the Web and greatly improve search engines, things that we promised XML would do in the first place: applications will be able to recognize HTML markup in any XML document, and to recognize foreign markup in HTML documentsthat's *way* more than we have now."

People can get to work defining and standardizing vocabularies, so that search engines can look for

{http://www.reuters.com/ns/}subject-code" or

{http://www.ecommerce.org/ns/}price" across large heterogenous
document bases.

Alas, David is not optimistic that the best technology will win:

I figure that we have only a 50% chance of success even with something this simple, about a 15% chance of success with the current XHTML WD,
and about 0.1% chance of convincing the Web community to use schemas, modules, etc."
(David Megginson)

Tim Bray and Tim Berners-Lee

Tim Bray, one of the XML Namespace Recommendation editors, and W3C Director Tim Berners-Lee gave each other some interesting revelations, although, ultimately it became clear they were on the same "page." You can follow the conversation, quoting previous messages, in the September 28th message of Tim Berners Lee.

Tim Bray commented:

Among other things, I don't believe that most interesting namespaces *have* definitive information, but have semantics that are communicated via some messy combination of schemas, stylesheets, prose documentation, and running code.

Tim BL responded:

We either have a different use of words or a very serious problem. Whereas with natural
language, meanings change and grow and everyone has slightly different associations with a
word, in [the] computer languages [that] we need to build on top of XML we need to have the ability to define meaning
precisely in terms of other existing languages.

Tim Bray:

I'm convinced that the namespace mechanism, while it's useful for what it is, is hopelessly inadequate as a tool for direct mapping between instances and schemas.

Tim BL:

I am disappointed to hear that from an author of the spec.

(Can I say " .... and definitive schemas?")

Tim Bray:

I am astonished and disappointed that the W3C still can't bring itself to publish a 3-line note saying that for those who see HTML just as HTML, and who want to mix & match those tags with others, they should use the following URI as a namespace name for HTML: <insert any old plausible namespace URI here>

Tim BL:

It depends on what you mean about mixing & matching. If you mean a return to the bedlam of HTML 2++I do not see it a part of leading the web to its full potential though evolution....and interoperability...If you mean define a clean superset language and its grammar and give it a label, then I would be all for it.

Bosak and Layman

The overall benefit of this high caliber debate was the insight it offered into XML namespaces and their relationship to XML schemas.

In Namespaces in XML and XHTML, Jon Bosak, Chair of the XML Coordination Group, contributed an excellent summary of Namespaces along with his rationale for deciding that there should be only one namespace for XHTML. Alternatively, Andrew Layman (another editor of the Namespaces in XML Recommendation) provided an equally compelling rationale for using three separate namespaces in his summary of the various XHTML-related namespace issues (XHTML and the Three Namespaces).

Perhaps Layman's most convincing argument is that the problem we have to solve is not only how we are going to be able to process like-tags from different DTDs in a similar fashion when possible, but more about how we're going to provide the ability to process those same tags differently, when desired.

Conclusion

The discussion continues, of course. But, at the end of the day, the debate might boil down to two issues. One is the particular design of XHTML and its considerations for backwards-compatibility. The other, broader issue is taking a position on "what namespaces really mean?" Since the "Namespaces in XML" specification only says how XML namespaces work, not what they are to be used for, people have been encouraged to put forth their view regarding how they might use them. Varied and flexible implementations of XML Namespaces are a good thing. In fact, that was the whole rationale behind not specifying what they were to be used for in the first place. In the mean time, whether the issues are technical or political, how they play out as XML matures, only time will tell.