On Fri, Nov 14, 2008 at 5:52 AM, Henry S. Thompson <ht@inf.ed.ac.uk> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jonas Sicking writes:
>
>> The XML spec also accepts quite a range of input as text/xml. Most of
>> it is invalid XML though.
>
> Not sure I understand. The XML spec. only mentions media types in
> passing in its discussion of (natural) language and encoding
> determination. It defines well-formedness (in general) and validity
> (wrt a DTD).
Yes, sorry, my argument was somewhat confusing. I'll put it in better terms:
There seems to be a lot of confusion about what the HTML5 draft
considers 'legal' or 'valid' HTML. HTML5 just as strict as HTML4 is
with regards to what is valid. The draft does *not* for example say
that wrongly nested tags are valid. They are considered an error.
This confusion seems to stem from the fact that HTML5 defines error
handling. However this does not change what is an error and thus
invalid and what is not.
Several people has asked for a spec without error handling. I'm not
sure why defining error handling is considered a bad thing. Is it
because people are worried that by defining error handling people will
rely on it, whereas people shouldn't rely on undefined behavior so if
we don't define error handling then people won't rely on it? Or is
there some other reason to leave out error handling from a
specification?
I'm not trying to be provocative but are genuinely trying to understand.
My experience is that a lot of people end up relying on undefined
behavior, unintentionally or not. Additionally I think that extremely
few people are going to check their documents against the spec, but
rather against other documentation and other tools.
I think the HTML4 validator at http://validator.w3.org/ has done
worlds more than the HTML4 specification for increasing the quality of
HTML documents on the web.
I have heard several orders of magnitude more people say that they
have validated their page against the validator, than I have heard
people say that they validated their page against the HTML4 spec. The
former I've heard many many times, the latter I've only heard a
handful of times when people file a bug against Firefox (actually
probably haven't heard that since the netscape days even). I don't see
a reason that this will change with future versions of the spec.
I hope that helps to explain my perspective on the issue.
/ Jonas