http://www.w3.org/Bugs/Public/show_bug.cgi?id=5003
------- Comment #13 from fabio@cs.unibo.it 2008-01-18 00:21 -------
Dear Michael,
> But what it does do is to destroy the property that you can validate an
> element, label it with a type annotation, and then copy the element to a new
> tree knowing that the type annotation will still be sound. That property is
> extremely important to QT.
I have been trying to come to grip to this and other comments along the same
line for a while now, and I haven't been able to understand their point.
Context-free validation can be considered as a property or a goal in the same
manner that (I'll steal the example from MSM in [1]) setting PI = 3 can be
considered a property or a goal: it would surely speed computations, but the
results will not be as precise as they should be, and we will need a lot of
post-process adjustments to fit observed data with the computed theory.
A schema is a contract between a data producer and a data consumer about the
range of acceptable data set that the former promises to deliver and the latter
promises to accept. Acceptability is NOT an abstract quality of documents and
it is in general NOT constrainable to context-free content models. Constraining
validity to content models only (i.e. to the subtree of each element) is a
simplification that separates validity from acceptability.
The acceptability of some elements in some document types does OFTEN depend on
circumstances that exist outside of the subtree they head. That is unfortunate
but is true, as the xml:lang example can testify, as well as the HTML
form/input example mentioned by MSM in [2], the <alternative> without test
attribute in XSDL 1.1, or for that matters even ID/IDREF pairs. I can provide
as many examples as you care to receive: once you know what to look for, you
start finding literally thousands of such situations in basically any XML
language.
Requiring validation to be context-free does not change these facts neither
does it provide an alternative and brilliant workaround to them: it merely
redefines "validity" to mean something less than "acceptability". It leaves
downstream applications with the hard choice between implementing
post-validation code to verify the inexpressible constraints and blindly
accepting possibly wrong documents.
To draw from our current use case, if you have a fragment with no xml:lang
attribute set that is down a tree that has an xml:lang set above, and move this
fragment to another tree that has no xml:lang attribute anywhere, then you can
cover your eyes and pretend that the fragment maintains type annotation and
validity, but in fact it does not. Or at least, it maintains them if you tweak
the definition of validity, but it still becomes "unacceptable" in a wider
sense, since the actual language of the fragment is not reflected in any
explicit attribute of the new tree. This has nothing to do with types, XML
Schema, assertions and acceptable XPath syntax. It is just a wrong (or, shall
we say, "simplified") assumption about what happens when you move data from one
position (where it was acceptable) to another (where it is not).
We can decide that we may live with such simplification, and stick to it
despite the proposed use cases, but let's not tell ourselves it is a design
goal or a theoretical property chosen as a feature: it is really just a sad
admission of implementation difficulties. Reasons for choosing this or other
simplifications need to be grounded on engineering concerns, and not on
abstract principles that have no actual correspondence on reality.
Ciao
Fabio
[1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=5003#c1
[2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=5297