Xerces, XmlBeans and XML Schema Import Resolver

I’ve been developing a framework to provide XML validation/assurance tools based on a range of parser/validator implementations in Java. A recent experience warranted flagging, given I’ve been surprised at the number or credible examples out there on yonder interweb.

The Headline

XmlBeans 2.5 and Xerces 2.9 behave differently when using a runtime schema/resource resolver. XmlBeans relies on the use of the EntityResolver interface whereas Xerces relies on the LSResourceResolver interface, but the way the XMLSchema <import /> statement is put together causes some consistency problems in terms of how these resolvers are used…

The Goal

Take a WSDL (with inline-schema) and then use that WSDL to validate XML instance documents. You cannot validate directly with a WSDL, and therefore as a developer you spend time messing with tools to extract schema and so forth. This tool will greatly simplify this process among other things. This approach involves unpacking the inline XML Schema documents in such a way that multiple XSD fragments can be utilised at the point of validation.

The Approach

Open the WSDL, extract the schemas taking care to maintain the namespace declarations from the outer WSDL definitions section now that XSD’s will become independent XSD files and unable to inherit through context. Spin up a version of XmlBeans and Xerces, and generate a n-way schema validation report this catering for variations in the level of compliance in any one validating parser.

The Problem

Some of the WSDL documents are very complex. They contain many, sometimes hundreds of hierarchically related XML Schema fragments, each of which ‘imports’ the namespaces of numerous other schema fragments. Clearly when I separate the XSD’s from the WSDL these imports cannot resolve through inherited context so I have to make sure that I can resolve these at runtime when the validating parsers are fed the root schema. Problem is the inline schemas in the WSDL contain only a namespace import:

And when this is exported to a standalone XSD file, you need to use a runtime SchemaResolver to acquire the other related XSD files such that the XmlBeans or Xerces engines can assimilate all of the type and namespace set necessary to validate the specified XML instance document. I spent a lot of time diagnosing why my XmlBeans implementation was not triggering the necessary ‘tell me where this schema lives?‘ events, such that element references were remaining unresolved, and the validation of the XML instance document was failing. With exactly the same input artifacts the Xerces resolver was happily going about it’s business and allowing me to feed it schema fragments all the way home.

So Watch Out For This…

If you are using XmlBeans, and you want to use a root XMLSchema to validate an XML document, but if the root XMLSchema contains <import /> statements, then you must have a schemaLocation present.

Only with the schemaLocation present will the core XmlBeans.compileXsd() function call out to your custom EntityResolver such that you can then map the requested ‘namespace’ to a physical input source. On the other hand the Xerces parser will work with the shorter format which is already present in my

Now the only problem here is that I did not want to modify the XMLSchema artifacts I was extracting from the WSDL on the basis that I am offering a service based on the inputs supplied. However, I have been forced into a position of needing to inject a schemaLocation tag into all the XSD’s as I’m extracting them from the parent WSDL. That said I am not specifying a physical location for the schemaLocation, merely reiterating the same generic namespace. I do this to just force the trigger of the resolver events, from which I then use only the supplied namespace=”http://some.other.namespace&#8221; tag in conjunction with some context information that I know in relation to where I put the schemas in my file-system, to resolve and supply the linkage to the required schema. This is sufficiently generic that I don’t deem it to be overly intrusive into the base schema, and I end up with the following modified import statements:

With this format I get uniform behaviour between XmlBeans and Xerces. Both call out to my runtime resolvers which are then able to use a simple mapping scheme to supply the schema content and I now able to operate consistently.

This did take me a long time to diagnose, so hopefully it will be of use to others.

Like this:

LikeLoading...

Related

This entry was posted on Friday, April 30th, 2010 at 9:17 am and is filed under xml. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.