Rebuked

Creativity is allowing yourself to make mistakes. Art is
knowing which ones to keep.

—ScottAdams

In a comment on my Postel’s
Law post,
MarkPilgrim
quite correctly chastises me for serving broken content. This, he goes on to
point out in a second comment, makes me a bozo and an incompetent fool, at least
by TimBray’s
metric.

What happened was this: even though each essay is carefully
checked for well-formedness and even validity, that checking doesn’t
resolve the server-side includes. One of those includes I create
by hand. Last time I edited it, I carelessly wrote
it in HTML instead of XHTML and got an empty tag wrong. (Another include
had some bogus namespace declarations in it, but that’s not a well-formedness
problem.)

Now, Mark was able to read the essay despite my error because
his browser ignored the error. Mark calls this lucky. I dunno. I think
there are two ways of looking at it:

Browser good. The browser is applying heuristics
to recover from XML well-formedness errors, thus allowing users to read
the content. Reading the content is what’s important and the browser successfully
renders it.

Browser bad. The browser is applying heuristics
to recover from XML well-formedness errors, thus masking problems that only
a bozo or an incompetent fool would be unable or unwilling to fix. This sets
the expectation that applications should recover from XML well-formedness errors.
That would be wrong, not least of all because it
introduces the possibility of much subtler and more serious problems later
on, as one application’s set of heuristics differ from another’s.

I infer that Mark subscribes the former view. I subscribe to the
latter. And I’m quite willing to play by the rules. The fact that
browsers render broken content is a bug. Had they rejected the content
as they should, it’s unlikely that my carelessness would ever have
been seen by the public. At the very least, that would have saved me
some embarrassment.

Comments:

You might also want to know that Internet Explorer does not support &amp;apos;. Instead of &apos; it simply shows &amp;apos;. Another reason not to send XHTML to browsers that only support HTML.

Actually, I subscribe to the view that browsers should go to virtually any length to display the information I asked them to display, but should inform me (subtlely) if the information is not well-formed/valid/whatever is appropriate. I talked about this a year and a half ago, and my position has not changed: http://diveintomark.org/archives/2002/08/20/how_liberal_is_too_liberal

If User-Agents filled your logs full of complaints, you would have been notified without disrupting the user's experience. If search engines refused to index your page, since the right action under Postel's law would be to refuse to republish broken content, you would have a motivation to fix it.

I've written on this here:
http://www.franklinmint.fm/blog/archives/000092.html

We've lived in a "browser good" world for a decade now, and a switch to strict interpretation and display of content will cause ludicrous amounts of needless pain. It's lamentable, but them's the breaks.

Mark's right that a "browser bad" is a horrid from the user perspective. From a developer's perspective, it's precisely what we need, because it leads to finding and fixing errors as early as possible. But very few of us are developers.

I'd argue what we really need is a "browser settable" world, where it defaults to "browser good" for legacy reasons, but lets the content producers and developers turn on a "browser bad" strict mode to find errors. Arguably, we're the only people that care about well-formedness in the first place...

Browser good. The web browser is not primarily a development tool, and validation services exist already for (X)HTML pages and XML feeds, if you're the sort that cares about such things.

If somebody wants to spend an afternoon educating my friends about why they shouldn't compose their blog entries in Microsoft Word, I'm all for it, but for my part, I've tried, and their eyes just glaze over whenever I attempt an explanation of character encodings on the web. Meanwhile, I'm grateful that I can still manage to read their terribly broken web pages and RSS feeds.

that's not really what's happening: you are serving the files as text/html so much of the XML syntax (eg &lt;meta .../&gt;)is just HTML syntax errors and you are relying on the browser's lax reporting of HTML errors for any of your page to be read.

&lt;quote&gt;The fact that browsers render broken content is a bug. Had they rejected the content as they should,&lt;/quote&gt;

The content is broken HTML if it is well formed XHTML, but HTML agents are allowed to be lenient. If you wish to play by XML rules, serve the content with an XML mime type, then you will get strict XML parsing rules in both IE and mozilla/netscape (although the display in IE might not be quite what you want unless you supply a stylesheet)

Basically by serving non-html as text/html you are _asking_ for
highly tolerant and lax parsing.

I thought of that myself yesterday. It occurred to me that the W3C validator is actually ignoring the content type when it does XHTML checking. Since the data is served as text/html, it has no business reporting XML well-formedness errors.

OTOH, it probably makes sense for a validator to exhibit this behavior.

So I was wrong. Again. Oh, well. I stand by my conviction that recover from well formedness errors in XML content is wrong.