In recent discussions, some but not all at the recent WWW6 conference, it has
become apparent that we have an opportunity, if we act now, to avoid one of
the big problems that has caused HTML a lot of grief. This is the area of
error-handling. HTML doesn't have any. As a result, the browser and tool
vendors are stuck on an endless treadmill of trying to enhance the system
while at the same time handling any and all collections of bytes that Netscape
1.X did. Get a couple of beers into anyone from the big N or the big M and
you'll see some real tears over this. In my former life as a Web indexer,
I cried some of those tears myself. So let's not let it happen again.
The subject is violations of well-formedness. Well-formedness should be easy
for a document to attain. In XML, documents will carry a heavy load of
semantics and formatting, attached to elements and attributes, probably with
significant amounts of indirection. Can any application hope to
accomplish meaningful work in this mode if the document does not even manage
to be well-formed!?!?
I suggest that we add language to section 5, "conformance", which says:
"An XML processor which encounters a violation of the constraints
of well-formedness must not thereafter pass any information about
text or markup to the application. It must pass to the application
a notification of the first such violation encountered. It MAY
thereafter, at user option, pass to the application information
about well-formedness violations encountered after the first."
[or in English: you gotta tell the app about the first syntax botch you hit;
you're allowed to send the app more error messages, but you're not allowed
to send anything but error messages after you've detected an error]
If we wanted to avoid phrasing this in terms of the actions of a processor
(which might be a good idea in general for the spec) we could redefine
"markup" and "character data" in such a way that they are considered not
to exist in a document which is not well-formed.
Some might argue that this violates the Internet creed: "Be conservative in
what you supply, and liberal in what you accept." I can live with that:
the consequences of the second half of that creed have led to intolerable
results in the quality and usability of the data on the Net. Furthermore,
if you want to serve up ill-formed dogshit, this will presumably remain
possible, because: "text/html means never having to say you're sorry."
Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592