After, my post concerning the XML parsing with JAXP (SAX and DOM APIs), here, I would present simple examples of validation XML stream with the JAXP (Java APIs for XML Processing) API which is a common interface for creating, parsing and manipulating XML documents using the standard SAX, DOM and XSLTs.

1. XML Validating
XML has become indispensable in Information Systems Architectures and J2EE. Used as a standard format for data exchange, standardized by the W3C, the XML document is present everywhere in applications, databases, and is at the heart of EAI exchanges.

In this fact, the knowledge of the APIs of XML parsing like DOM, SAX is often necessary in the development of a J2EE application. Understand the differences, strengths and weaknesses of these APIs is important to avoid performance problems that may be encountered on these complex APIs.

So, to process the XML documents, an application needs an XML parser to tokenize and retrieve the data/objects in the XML streams. An XML parser is the programme between the application and the XML documents which reads a XML stream, ensures that is well-formed, and may validate the document against a DTD or schema definition XSD.

The JAXP (Java APIs for XML Processing) provides a common interface for creating, parsing and manipulating XML documents using the standard SAX, DOM and XSLTs.

Well-formed and valid document
A XML document is well-formed, if its structure meets the XML specification, i.e. it is syntactically correct. A XML document is valid, if it is well-formed AND if its structure and datas (elements and attributes) meet the specifications defined in definition documents.

In this article, we will study examples with the Document Type Definition (DTD) and XML Schema Definition (XSD).

2. Document Type Definition (DTD)
Document Type Definition (DTD) describes the objects (such as elements, attributes, entities) and the relationship of the objects in a XML document. It specifies a set of constraints and establishes the trees that are acceptable in an XML document.

A DTD can be declared inside an XML document (i.e., inline), or referenced as an external file.
An inline DTD is wrapped in a DOCTYPE declaration, and has the following syntax:

<!DOCTYPE root-element [
declarations
]>

A DTD can also be stored in an external file. An XML document can reference an external DTD via the following syntax:

<!DOCTYPE root-element SYSTEM "DTD-filename">

DTD Syntax
XML’s DTD has its own syntax different of XML’s syntax which consists of declarations (for element, attributes and so on) such as:

Default:
o #REQUIRED: must be provided in the document.
o #IMPLIED: use the application default.
o #FIXED value: must use this value.
o A literal default value.

Entity declaration: A “entity” is a variable allowing the definition of replacement text or special characters where the entity reference is used in the form of &entity-name; to obtain the value of the variable. Entities can be declared inline or external:

Usage and Limitations of DTD
DTD defines the structure of XML documents, which could facilitate exchanges of documents between services. However, DTD has some limitations:

DTD has its own syntax (which is inherited from SGML DTD) and requires a dedicate processing tool to process the content. It does not use XML syntax and XML processor.

DTD does not support object-oriented concepts such as hierarchies and inheritance.

DTD’s data type is limited to text string; and does not support other data types like number, date etc.

DTD does not support namespaces.

DTD’s occurrence indicator is limited to 0, 1 and many; cannot support a specific number such as 8.

3. XML Schema Definition (XSD)
XML Schema developed by W3C via a recommendation in May 2001, is a description language to define the structure and content type of an XML document. It overcomes the limitation of DTD and meant to replace DTD for the checking of XML document validity. In brief, the XML Schema:

is a well-formed XML document, which uses XML syntax;

is object-oriented, support concepts like inheritance;

supports namespaces;

supports more data type;

more element occurrence indicators.

Note: The current version of XSD 1.1 (september 2012) became a approved W3C specification in April 2012.

So, the purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. Steps to follow in order to write a XSD document:

Order indicators:
o xs:all: sub-elements appear in any order;
o xs:choice: indicates that a single sub-elements may appear;
o xs: sequence: instructs the sub-elements, they must appear in a specific order;

Indicators of group:
o group: allows to group logically the elements;
o attributeGroup: allows to group logically the attributes;

Extension

the Any tag allows to add any item as a result of those precisely defined:

<xs:any minOccurs="0"/>

the anyAttribute tag allows to add attributes not specified in the schema;

the substitutionGroup tag allows to define a schema that applies to an XML document whose the tags still would not carry the same name:

<name /> <nom />

. It is also possible to block the substitution.

Usage and Limitations of XSD
XSD is a description language to define the structure and content type of an XML document. It overcomes the limitation of DTD and meant to replace DTD for the checking of XML document validity.

XSD allows the creation of standards (Internet languages like xHTML, RSS, WSDL…etc), allows the data integrity, allows a very accurate validation compared to the DTD. However, XSD has the limitation to be long to write for complex structures.

4. XSD vs DTD

The “old” DocType (DTD: Document Type Definition) allows to define a structure for an XML file, however, the XML Schema standard is intended to replace it for several reasons:

The XML Schemas are XML documents, hence, all tools (validators, parsers, processors, …) but also scripts and languages (XSLT, XPath, …) working on XML documents, are used on XSD documents.

The XML Schemas allow the much finer management of documents structure: order (or disorder) of sub-elements, the number of occurrences of an element, very precise management of data types contained by the elements and attributes (possibility to apply regular expressions on the data, or types with high semantic value such as date type), …

It is possible to use and to interact very easily the XSD documents themselves.

Following, some examples where XSD is more useful and adapted than DTD:

Example n°1: We need to communicate a date via a XML stream between 2 different systems like SAP (mm.dd.yyyy) and RMI server (dd/mm/yyyy). These systems have an incompatible date format, however, with XSD, we can use the date type which the format yyyy-mm-dd.

Example n°2: We need to validate an email address, however, there is not standard type for the format of an email address; no panic!!! with XSD, we can define new type “EmailAddress”. We could define an own format, but there are a lot of collection of universally-useful data types defined in the W3C XML Schema language like XML Schema Standard Type Library (XSSTL) at http://www.codesynthesis.com/projects/xsstl/.
The following type allows the validation of an email address:

Many Java XML APIs provide mechanisms to validate XML documents, the JAXP API can be used for most of these XML APIs but configuration differences exists. This article shows some ways of how to configure different Java SAX and DOM APIs using JAXP for checking and validating XML with DTD and XSD.

Error Handler
To report errors, it is necessary to provide an ErrorHandler to the underlying implementation.