Fraser Goffin writes:
> yes I agree that structural validation is important, and I
> further agree that the various checks that are made on the data
> are cummulative and go to the heart of data integrity.
I think there are some further nuances worth setting out. In most of this
discussion, there has been an implicit assumption that the schema
validation languages are not Turing Complete [1]. For those unfamiliar
with the term, what I mean is that languages like XSD or RelaxNG aren't
powerful enough to compute all the things you can with languages like C,
Java, or Cobol. For example, you can't compute all the prime numbers in
XSD or RelaxNG, so you can't in practice write a schema type that would
validate only prime integers as the content of some element. If your
schema language was, say, Java then you could write a schema to make sure
that your XML element contained a prime number, and for a mathematician
that would be a very sensible check to attempt. There are, of course,
good reasons for not using Turing Complete languages as our main schema
languages. One obvious one is that programs in Turing complete languages
don't necessarily execute in bounded time. You can always check an XML
instance against and XSD or RelaxNG schema in bounded time, and usually
quite quickly. Most of our schema languages also handle the simple cases,
such as looking for a fixed sequence of elements, very easily. Incidently,
all the Turing Complete languages like C and Java have the same
computational power: if you can compute prime numbers in one, you can do
it in all the others.
Anyway, I'd say there are at least four shades of grey to consider:
* Content validation that can be implemented in your schema language (the
element name is legal, and the content is an integer)
* Content validation that your schema language can't handle (the number is
prime)
* Business validation (that looks like a credit card number, but our
records show that the card was stolen, so it's not "valid" for use in a
purchasing transaction)
* Semantic incompatibility (we used to use the field for an account
number, but in Version 2 of the language it identifies a particular credit
card)
BTW: I know I've sent this link from time to time before, but if you're
interested in the tradeoffs between using powerful vs. less powerful
languages, Tim BL did a very nice analysis, and I helped him edit it as a
TAG finding last year. It's at [2].
Noah
[1] http://en.wikipedia.org/wiki/Turing_complete
[2] http://www.w3.org/2001/tag/doc/leastPower.html
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------