Document Validation in XML.NET

Is your XML valid? The .NET Framework's XmlValidatingReader class provides you the means to verify the validity of your XML documents and fragments. Wintellect's Dino Esposito explains the details.

The XmlValidatingReader class is an implementation of the XmlReader class that provides support for several types of XML validation: Document Type Definitions (DTD), XML-Data Reduced (XDR) schemas, and XML Schemas (XSD). You can use the XmlValidatingReader class to validate entire XML documents as well as XML fragments.

The XmlValidatingReader Class

The class works on top of an XML reader—typically an instance of the XmlTextReader class. The text reader is used to walk through the nodes of the document whereas the validating reader examines and validates each single piece of XML according to the requested validation type. Although the XmlValidatingReader class inherits from the base class XmlReader, it really implements internally only a very small set of all the required functionalities. Because the class works on top of an existing XML reader, many methods and properties are just mirrored.

The dependency of validating readers on an existing text reader is particularly evident if you look at the class constructors. An XML validating reader, in fact, cannot be directly initialized from a file. The following is the most commonly used constructor.

public XmlValidatingReader(XmlReader);

The programming interface of the XmlValidatingReader class does not explicitly provide for a single method good at validating the whole content of a document. The validating reader still works incrementally, node by node, as the underlying reader proceeds. Each validation error found along the way results in an particular event being notified to the caller application. Hence, to track messages and detects errors the application must define an ad-hoc event handler. The handler for the event has the following signature:

The Message field returns the description of the error. The Exception field, instead, returns an ad-hoc exception object (XmlSchemaException) with details about what happened. The schema exception class contains information about the line that originated the error, the source file and, if available, the schema object that generated the error. The Severity field represents the severity of the event. The XmlSeverityType defines two levels of severity—Error and Warning. Error indicates that a serious validation error occurred when processing the document against a DTD, XDR or a XSD schema. If the current instance of the XmlValidatingReader class has no validation event handler set, then an exception is thrown. Typically, a warning is raised when there is no DTD, XDR or XSD schema to validate a particular element or attribute against. Unlike errors, warnings do not throw an exception if no validation event handler has been set.
To validate an entire XML document you just loop through its content, as shown in the code below:

Notice that the reader's internal mechanisms responsible for checking the document's well-formedness and schema compliance are distinct. So if a validating reader happens to work on a badly-formed XML document, no event is fired but an XmlException exception is raised.

How Validation Takes Place

The validation takes place as the user moves the pointer forward using the method Read. Once the node has been parsed and read, it gets passed on to the internal validator object for further processing. The validator operates based on the type of the node and the type of validation required. It makes sure that the node has all the attributes and the children it is expected to have.

The validator object invokes internally two flavors of objects: the DTD parser and the schema builder. The former processes the content of the current node and its subtree against the DTD. The latter builds up a schema object model (SOM) for the current node based on the XDR or XSD schema source code. The schema builder class is actually the base class for more specialized XDR and XSD schema builders. What matters, though, is that XDR and XSD schemas are treated in much the same way and with no difference in performance.

If a node has children, another temporary reader is used to read its XML subtree in such a way the schema information for the node can be fully investigated.

On the validating reader class, the Schemas property represents a collection—instance of the XmlSchemaCollection class—in which you can store one or more schemas that you plan to use later for validation. The use of the schema collection improves the overall performance because the various schemas are held in memory and don't need to be loaded each and every time the validation occurs. You can add as many XSD and XDR schemas you want, but bear in mind that the collection must be completed before the first call to Read is made.

Validating Inline Schema

An interesting phenomenon takes place when the XML schema is embedded in the same XML document being validated. In this case, the schema appears as a constituent part of the source document. In particular, it is a direct child of the document root element.

The schema is an XML subtree that is logically placed at the same level as the document to validate. A well-formed XML document, though, cannot have two roots. Thus an all-encompassing root node must be created with two children: the schema and the document.

As a result, the root element cannot be successfully validated because there is no schema information about it. When the ValidationType property is set to Schema, the XmlValidatingReader class throws a warning for the root element if an inline schema is detected. Be aware of this when you set up your validation code. A too strong filter for errors could signal as wrong a perfectly legal XML document if the XSD code is embedded.

The warning you get from XmlValidatingReader is only the tip of the iceberg. Although XML Schema as a format is definitely a widely accepted specification, the same cannot be said for inline schema. An illustrious victim of this situation is the XML code you obtain from the WriteXml method of the DataSet object when the XmlWriteMode.WriteSchema option is set. The file you get has the XML schema inline but if you try to validate it using XmlValidatingReader it does not work!

In general, the guideline is to avoid inline XML schema whenever possible. This improves the bandwidth management (the schema is transferred at most once) and shields you from bad surprises. As for the DataSet, if you take the schema out to a separate file and reference it from within the DataSet serialized output everything works just fine. In alternative, with the XmlValidatingReader object you can preload the schema in schema cache and then proceed with the parsing of the source.

About the Author...

Dino Esposito is Wintellect's ADO.NET expert and a trainer and consultant based in Rome, Italy. Dino is a full-time author, a full-time consultant and a full-time trainer. Prior to this, Dino was a full-time employee and worked day and night for Andersen Consulting focusing on the very first real-world implementations of DNA systems. Dino also has extensive experience developing commercial Windows-based software, especially for the photography world, and was part of the team who designed and realized one of the first European image online databanks. You can also check out Dino's newest book, Building Web Solutions with ASP.NET and ADO.NET.