Re: [xml-dev] Your XML documents may use different sets of characters,depending on which implementer you select?

From: Michael Kay <mike@saxonica.com>

To: xml-dev@lists.xml.org

Date: Tue, 17 May 2011 15:27:37 +0100

> Unicode is an evolving standard. Thus, there are different versions.
>
Standards bodies spend a great deal of time trying to decide whether
they should "fix" their references to a specific version of an
underlying standard, or allow the dependency to "float" to a new
version. It's a universal problem, and of course there are advantages
both ways, so no right answer. Allowing the dependency to "float"
ensures that users are not locked out from the benefits of new versions
of the underlying technology, but gives them less of a guarantee of
stability.
An example of how you can get this spectacularly wrong is the dependency
in XSLT 1.0 on the JDK 1.1 definition of the DecimalFormat class. Not
only are implementations of JDK 1.1 thin on the ground nowadays, but the
specification itself has been on occasions quite hard to locate: at
present Oracle, to their credit, do serve useful information at the URL
used by the XSLT 1.0 spec, but that has not always been the case.
Furthermore, the JDK 1.1 specification of DecimalFormat is full of bugs
that were fixed in later versions, and any reasonable implementor will
fix these the way that subsequent versions of the spec indicate: so
specifying a "fixed" version will not necessarily stop implementors
deciding to "float" if that's what appears to make sense.
(I saw another example of this with X.400. My company had implemented
this "as written" allowing only ASCII characters in email addresses.
Another company had extended the spec to allow Latin-1 characters. Since
our product wasn't capable of sending email to the other product's
users, we were forced to follow suit. Sometimes when standards bodies
get it wrong, they get ignored.)
Michael Kay
Saxonica