On Jul 9, 2007, at 11:12 AM, Andrew Sidwell wrote:
> Robert Burns wrote:
>> On Jul 9, 2007, at 9:34 AM, James Graham wrote:
>>> Robert Burns wrote:
>>>> Despite some confusion on these issues, there isn't a single
>>>> right way to do these things and the sooner we can acknowledge
>>>> that the easier our task will be.
>>>
>>> If you're talking about XML parsing there really is only one way
>>> to do
>>> it; the DOM you get is determined by the XML spec. Any browser that
>>> does something different has a bug.
>>
>> I've been working with primarily XML for nearly a year now (CSS
>> and DOM
>> and translation). And I can tell you it's not as unambiguous as you
>> might think. There's definitely ambiguity and there's room to
>> clear up
>> ambiguity. The XML spec is most clear on well-formedness. After that,
>> there's wiggle room.
>
> Instead of just stating "there's wiggle room", please could you give
> examples of where such room exists? It's very hard to understand
> any of
> the issues involved based on such vague statements.
Sure, sorry for the ambiguity. I've often been writing at great
length on topics to have my words dismissed with a turn of phrase..
I'l try to provide a couple of examples off of the top of my head
that have been changing and continue to change with XML parsing.
First is the treatment of named character references (or character
entity references in SGML nomenclature). Early XHTML UAs would throw
up fatal errors when encountering these, just as they throw up fatal
errors for ill-formed elements. I imagine this has been a significant
frustration for authors trying to move seemingly well-formed code
over to XML processing. Over time, Mozilla (and I think WebKit is
moving in this direction too) has added support for them: basically
hard-wiring its knowledge of HTML. XML makes a distinction between
DTD retrieving UAs and non-DTD retrieving UAs. Most UAs do not
retrieve a DTD, however, that hasn't stopped them from adding
knowledge from those DTDs to the processing of XHTML.
The same situation arises with WebKit's treatment of XHTML and the
inferred tbody element. At some point the WebKit team decided to
infer an actual tobdy element and insert it into the DOM based on its
knowledge of the HTML namespace (separate from XML requirements).
These are decisions UA developers have to make all the time.
Sometimes it breaks interoperability. Sometimes it actually fixes
interoperability. However, from our point of view, we should be
willing to consider such measures and not simply dismiss them out-of-
hand, because we're in a unique position to promote such measures to
improve interoperability and help users, authors and UA developers
alike.
Do named character references belong in XHTML (i.e., are they even in
the DTD)? I don't even recall off of the top of my head. However, I'm
still running into tools that obliterate my Unicode characters, and
so maybe its too soon to drop named character references from the
HTML namespace (I know Dan reminded me they are not technically part
of the HTML namespace, but that's how we tend to think of it). Should
WebKit be inserting an inferred tbody element into the DOM. Not per
the current spec, but since we're developing the next spec, its a
possibility we shouldn't dismiss, just because its not what XHTML1 did.
XML requires fatal errors on ill-formedness errors. It does not
require failures on invalidity errors. Perhaps someone will cite a
passage to prove me wrong, but I don't recall reading anything in XML
that would prohibit a UA with hard-wired knowledge from repairing
invalid text by, for example, adding in a missing tbody element.
(presuming that conformance required it)
I'm sure if I did a little research I could come up with some other
examples. the important to keep in mind is that XML separates
validity from well-formedness. It requires fatal errors on ill-
formedness and not on invalidity. Certainly any DTD that includes
named character references would potentially lead to ill-formedness
errors for non-DTD-retrieving UAs. But there's no reason that even
those UAs can't implement those named character entities through hard-
wiring them (like Gecko).
From what I've witnessed over the last year, the XML UAs are still
figuring out what XML and XHTML conformance is. We could certainly
weigh in on that: particularly regarding HTML5/XML.
Take care,
Rob