XML Schema

A Wikibookian believes this page should be split into smaller pages with a narrower subtopic.

You can help by splitting this big page into smaller ones. Please make sure to follow the naming policy. Dividing books into smaller sections can provide more focus and allow each one to do one thing well, which benefits everyone.

Welcome to the XML Schema book. It describes the structure of an XML Schema and explains how XML Schemas are used to validate XML documents.

Editor's note
This book is designed as a reusable Learning Object and must take into consideration the constraints of many learning environments. Please think carefully about adding material that would not allow the course to be reused by a broad variety of audiences and thus the course should be kept as modular as possible. These materials are designed to be implementation technology neutral so the course can be integrated into open source learning management systems such as Moodle. For example, please do not put examples that depend on either using Java or Microsoft .Net. Computer language and operating system-specific issues such as Java, Microsoft, Mac, GNU/Linux and Windows dependencies should be isolated into separate labs that can be optimally included by instructors.

XML Schema is a standard created by the world wide web consortium http://www.w3c.org. Unlike DTDs, XML Schema uses XML file formats to define the XML Schema itself. Once you learn the structure of an XML file, you don't have to learn another syntax.

XML Schemas are primarily used to validate the structure and data types of an XML document. By structure we mean that an XML Schema defines:

what data elements are expected

what order the data elements are expected in

what nesting these data elements have

what data elements are optional and what data elements are required

XML Schemas can also be used by XML data mapping tools for quickly extracting data from databases and transferring them in XML files.

One of the best analogies is the blueprint analogy. Just like there are architectural blueprints that describe the structural design of a house, an XML Schema provides the "structural design" of a file.

Although XML Schemas are excellent at sequential validation of data elements and data types, XML Schema tend to be cumbersome at expressing highly complex business rules. For example when you are at the end of a large file it is difficult to state a rule that checks if a data element has some value that another data element at the beginning of the file should have had another values. This can be done by using XML transforms and XPath expressions.

Although any prefix can be used to refer to the namespace http://www.w3.org/2001/XMLSchema, the most common convention is to use "xs". Some people prefer "xsd"; some prefer to use the default namespace (which means no prefix is necessary). All XML Schema elements are in this namespace.

An XML Schema defines elements and attributes which are available in a namespace (i.e. http://www.example.org/contactExample). In the XML Schema, this namespace is defined using the targetNamespace attribute.

In an XML file, a namespace can be imported using the xmlns attribute (xmlns stands for XMLNameSpace). The xmlns attribute name can be ended with : and a prefix (i.e. xs). In this case, the imported tags must be used with this prefix. Prefix are used to distinguish tags with same names imported from different namespaces.

You can see that the target namespace we are defining in the example is one of the namespace imported in the XML file. You can see that we are importing the namespace of the document itself with the tns prefix, so that elements we are defining in the document can be used in the document itself starting with tns: .

<xs:sequence> is used for ordered group of elements for unordered group of elements use <xs:all>.

Some schema restrictions and facets can be defined to the data type using the <xs:simpleType/> and the <xs:restriction/> markups. For instance, a body text of string type can be fixed to a length of 5 characters using the <xs:length/> markup as above:

Specify the value that is just lesser than all the allowed values for any ordered data type

Any integer, decimal, date or time

minInclusive

Specify the lessest allowed value for any ordered data type

Any integer, decimal, date or time

maxExclusive

Specify the value that is just greater than all the allowed values for any ordered data type

Any integer, decimal, date or time

maxInclusive

Specify the greatest allowed value for any ordered data type

Any integer, decimal, date or time

totalDigits

Specify the maximum number of digits allowed to the left and the right of the decimal point

A positive integer

fractionDigits

Specify the maximum number of digits allowed to the right of the decimal point

A non negative integer

length

Specify the exact number of characters allowed

A non negative integer

minLength

Specify the minimum number of characters allowed

A non negative integer

maxLength

Specify the maximum number of characters allowed

A non negative integer

enumeration

Specify an allowed value or a set of allowed values

Any

whiteSpace

If "preserve" value is specified, any character is kept as it is.
If "replace" value is specified, any tab (#x9), line feed (#xA) and carriage return (#xD) are replaced with space (#x20).
If "collapse" value is specified, any tab, line feed and carriage return is replaced with space too,
consecutive spaces are merged into a single space and beginning and ending spaces are deleted.

If the element contains both attributes and sub-elements, the <xs:attribute/> markups must be defined above the <xs:all/>, <xs:sequence/> or <xs:choice/> markup. Data types, restrictions and facets can be defined for attributes as it is for text-body-only elements.

Simple and complex types can be defined beside the element tree. In this case, the <xs:element/> markup has no body, keeps its name attribute and has a type attribute. The <xs:complexType/> markup is then defined outside the root element with a name attribute containing the element type name. There is no change for the XML file validation. Let's take this XML Schema:

Complex and simple types can be defined in any order. A defined type can be reused in different elements of the schema and then its description is not duplicated. It avoids the XSD file to be too much indented. Moreever, using type definitions, the elements have not only a name but also a type name which can be used as a class name too. Some tools used to parse XML content according to an XML Schema can require a type name for complex type elements.

Elements and attributes can be reused using references. In this case, the <xs:element/> markup or <xs:attribute/> markup has no body, no name attribute and has a ref attribute. This ref attribute contains the name of another element or another attribute. Let's use a reference on the Person element on the previous example:

The difference between separate type definition above and using reference is that any element or attribute can be referenced. Moreever, using reference, links are done using names instead of type names. This means the we are not linking classes but instances of classes.

Defined complex types can be reused adding sub-elements or attributes. The complex type is then extended. It can be done using the <xs:complexContent/> and the <xs:extension/> markups. The extended type name is defined in the base attribute of the <xs:extension/> markup. Here is an example where the PersonType complex type is extended with the professional attribute for the Person element:

Various elements with common and different sub-elements or attributes can be defined like that. The common items would be defined in a common complex type and the different items would be defined in different complex types extending the first one.