I'm sorry I made a little error in my previous note ( I wrote " in our
first example, we say that schema B.xsd is valid by itself because
validate(B.xsd) returns true" instead of " in our first example, we say
that schema B.xsd is invalid by itself because validate(B.xsd) returns
false").
This note fixes this error.
XML Schema Part 0: Structures 6.2.2 says :
"If the normalized value of the schemaLocation [attribute] successfully
resolves, it resolves either
1.1.1 to (a fragment of) a resource of type text/xml, which in turn
corresponds to a schema element information
item in a well-formed information set, which in turn corresponds to A VALID
SCHEMA
or
1.1.2 to a schema element information item in a well-formed information
set, which in turn corresponds to
A VALID SCHEMA
"
I guess this means that the redefined schema must be valid by itself .
If my interpretation is correct, it means that schema A.xsd in the
following example is invalid :
FIRST EXAMPLE:
A.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:redefine schemaLocation="B.xsd">
<xsd:complexType name="cB" abstract="false">
<xsd:complexContent>
<xsd:extension base="cB"/>
</xsd:complexContent>
</xsd:complexType>
</xsd:redefine>
</xsd:schema>
B.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:complexType name="cB" abstract="true">
<xsd:sequence>
<xsd:element name="eB" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="eB2" type="cB"/>
</xsd:schema>
Schema B.xsd is invalid because element eB2 violates the rule
specified at 3.4 "A complex type for which {abstract} is true must not
appear as the {type definition} of an Element Declaration (Â§2.2.2.1)" .
But when we apply the redefinition in A.xsd, element eB2 now uses a non
abstract type and is therefore valid. So if we are only interested in the
resulting schema (the schema A.xsd after redefinition of schema B.xsd) we
could say that it is valid. However it seems that the spec requires that
schema B.xsd must be valid by itself . So even if the redefinition fixes
the problem in schema B.xsd, schema A.xsd is still invalid!
The problem is that in some cases the redefined schemas and the
redefining schema are so tightly coupled that they cannot be validated by
themselves . I think the spec should either :
1/ prohibit these cases
or
2/ specify how schema processors must behave in these cases.
These situations occur when a schema A redefines a schema B which,
directly or indirectly, includes, imports or redefines schema A.
Example :
SECOND EXAMPLE
A.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:redefine schemaLocation="B.xsd">
<xsd:complexType name="cB" abstract="false">
<xsd:complexContent>
<xsd:extension base="cB"/>
</xsd:complexContent>
</xsd:complexType>
</xsd:redefine>
<xsd:complexType name="cA">
<xsd:sequence>
<xsd:element name="eA" type="cB"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
B.xsd
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">
<xsd:redefine schemaLocation="A.xsd">
<xsd:complexType name="cA" ">
<xsd:complexContent>
<xsd:extension base="cA">
<xsd:sequence>
<xsd:element name="eB3" type="cB"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:redefine>
<xsd:complexType name="cB" abstract="true">
<xsd:sequence>
<xsd:element name="eB" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="eB2" type="cB"/>
</xsd:schema>
I think that in this situation we should not try to validate B.xsd
by itself . We should only focus on the outcome. In other words, we should
only try to validate schema A after all components have been constructed
and reference redirections (needed because of redefinition) have been
performed.
What could happen if a processor tries to validate B.xsd by itself ?
First, I would like to point out that, in this second example, the meaning
of " validate B.xsd by itself" is not clear. In our first example, the
meaning of "validate B.xsd by itself" was straightforward because B.xsd
did not use schema A.xsd. If we suppose that we have a function "boolean
validate(file aSchemaLocation) " which takes a schema file and returns
whether the schema defined in the file is valid, in our first example, we
say that schema B.xsd is invalid by itself because validate(B.xsd) returns
false.
We cannot apply the same definition of " validate B.xsd by itself"
to our second example, because if we do so we end up with an infinite loop.
validate(B.xsd) will call validate(A.xsd) which will call validate(B.xsd)
...
Therefore, if the spec wants conforming processors to always validate
redefined schemas by themselves, it should clearly define what it means in
cases of tightly coupled schema documents (One possible definition could
imply replacement of all redefine statements in redefining and redefined
schemas by include statements). But I believe it would be much easier to
let processors only focus on the outcome schema and not try to validate
redefined schemas by themselves.
Besides the ambiguous definition of "validate a schema by itself",
another problem if a processor does not focus only on the outcome and try
to validate some fragments of the schema before all components are in
their final form (components are in their final form when all components
have been built and reference redirections required by the redefined
statements have been performed ) is that , depending on the order in which
reference redirections are done, in our second example, elements eB2, eB3
and eA may be valid or not (because they will use or not an abstract type)
.
So, once more, the spec should either
1/ say clearly that processors must only focus on the resulting
schema (in this case, the first example is valid)
or
2/ explicitly specify the order in which intermediate fragments must
be validated (in this case, depending on the specified order , the first
example may or may not be valid).
It would be much easier to choose the first alternative : conforming
processors must only focus on the resulting schema.
Achille Fokoue.