XML Schema Definition:A Definitive Quick Guide

The XML Schema Definition is a dense document that will take you a lot of time to parse to get to the heart of what you’re after. Even if you’ve worked with XML and XSDs for years, you may still need to drill down into it in order to resolve issues behind XML validation. I did this on a recent project where I had to research issue with XSD headers. I found some great resources that I want to share with you.

XML Document

The XML document is often an instance of an XSD document (which is discussed below) this is done by declaring namespaces in the XML nodes as attributes.

Namespaces

Namespaces are defined in an XML document using the attribute xmlns.

A namespace defines semantics for a collection of tags and attributes.

Namespaces are only a case sensitive string that is usually a URI.

<namespace-uri:local-tag>

Every tag is comprised of a namespace and local-tag name as seen above. The namespace can be declared in a node, this is why we often only see the local-tag portion.

For example:

<root xmlns="http://www.foo.com">
<test />
</root>

This is indicating that the default namespace for the document is ‘http://www.foo.com’

We can also prefix the namespace. This is like creating a shortened reference to the namespace.

For example:

<foo:root xmlns:foo="http://www.foo.com">
<foo:test/>
</foo:root>

Notice the difference between the two examples above: xmlns:foo and xmlns.

So why wouldn’t just use the default? Well, we can import multiple sets of XML sematics and mix them into a document. In this case we want to be able to tell the reader how to treat each node.

Lastly the schemaLocation attribute points to where the defined schemas can be found. The schemaLocation can be a bit more tricky

If one location is defined, then it points to the schema location for the document

It can also be defined as multiple pairs separated by a space, such as http://foo.com/bar http://location.com/foocom/bar where the first half of the pair is the namespace and the second half of the pair is where to find the document to be validated. This gives the user power to defined multiple pairs if multiple schema namespaces are imported.

The schema locations can be a web address or a local location.

The schemaLocation attribute is defined by W3C’s schema instance namespace, which is declare as

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

XML Schema Definition

The XML Schema Definition (XSD) defines the protocol for XML document instances. The scope of this article is to discuss header declarations and not the specifics of how to define an XSD.

The XSD is actually instance of an XSD defined by W3C. So when defining a schema we want to declare the W3C schema namespace. This is done using http://www.w3.org/2001/XMLSchema. Note: this is imported to a namespace xmlns and assigned to the prefix xs. Just like the examples above we can assign the namespace to any prefix or no prefix allow it to be the default namespace.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
</xs:schema>

The targetNamespace attribute is used to define what namespace the XSD document is defining. In other words, if your XML document declares xmlns:foo=”http://foo.com/bar” The corresponding XSD document should have a targetNamespace attribute of targetNamespace=”http://foo.com/bar”

Summary

How schemas and schema instances are defined can seem quite confusing, but understanding a few basic concepts helps clarify and show how simple using them actually is. The best way to learn is to read this blog and play with some of the concepts using an XML schema validation. A link to a simple Java package that will validate XML to XSDs is located in the resources below.

xmlns — is used to declare the a namespace in the document. Without a prefix it is used as the default namespace of the document.

xmlns:prefix — declares a namespace and assign the namespace to a prefix. Meaning if the prefix is used before a node tag it is assumed that the node is defined by that prefix (e.g. <xs:schema>).

xmlns:xs=http://www.w3.org/2001/XMLSchema — Used in XSDs to declare that the document is specifying a schema. Note: xs or xsd is often used by convention for this declaration.

xmlns:xsi:http://www.w3.org/2001/XMLSchema-instance — Declared in the XML and is used to help declare that the XML document is an instance of and XSD. The convention is to use the namespace prefix of xsi.

targetNamespace: is used in the header of the XSD to defined the namespace that the XSD describes.

schemaLocation – Gives the location of one or more XSDs used in the XML document.

Part 1 of the Data Ingest Series The process of data transforms and load (DTL) goes by many names: Data acquisition Data ingest Enterprise transform and load (ETL) But they all are about getting external data into the system. The problem that most businesses face is that there are no easy to follow best practices […]

A UI that is responsive to device and browser size is critical to provide usable access to your website and services. One of the most important parts of the UI is the navigation bar (navbar), which allows users to easily find and access information. The good news is that building a responsive navbar is not […]

Growth We do a lot of work with growth companies, across all scales from startup to multi-national firms. This post on LinkedIn from Deby Joevita is a really great encapsulation of a growth lifecycle that works, and draws on well-established disciplines and methodologies. How Growth Stage Entrepreneurs Build Meaningful Product – D3BY Oracle Java SE […]