Summary

This article introduces a new C#-XML data binding technique based entirely on VTD-XML and XPath. The new approach differs from traditional C# XML data binding tools in that it doesn't mandate schema, takes advantage of XML's inherent loose encoding, and is also more flexible and efficient.

Limitations of Schema-based XML Data Binding

XML data binding APIs are a class of XML processing tools that automatically map XML data into custom, strongly typed objects or data structures, relieving XML developers of the drudgery of DOM or SAX parsing. In order for traditional, static, XML data binding tools (e.g., various binding frameworks in .NET, Liquid Technology, Castor.NET) to work, developers assume the availability of the XML schema (or its equivalence) of the document. In the first step, most XML data binders compile XML schemas into a set of class files, which the calling applications then include to perform the corresponding "unmarshalling."

However, developers dealing with XML documents don't always have their schemas on hand. And, even when the XML schemas are available, slight changes to them (due often to evolving business requirement) would require the class files to be generated anew. Also, XML data binding is the most effective when processing shallow, regular-shaped, XML data. When the underlying structures of XML documents are complex, the typed hierarchical trees still need to be manually navigated, which can still require significant coding.

Most limitations of XML data binding can be attributed to its rigid dependency on XML schema. Unlike many binary data formats, XML is intended primarily as a schema-less data format flexible enough to represent virtually any kind of information. For advanced uses, XML also is loosely encoded: applications may use a portion of the XML document that they need, and skip the parts that they don't care about. Because of XML's loose encoding, Web Services and SOA applications are far less likely to break in the face of changes.

Interestingly, the schema-less nature of XML has subtle performance implications in XML data binding. In many cases, only a small data subset in an XML document (as opposed to the whole data set) is needed to drive the application logic. Yet, the traditional approach indiscriminately converts the entire data set into objects, producing unnecessary memory and processing overhead.

Binding XML with VTD-XML and XPath

Motivation

While the concept of XML data binding has essentially remained unchanged since the early days of XML, the landscape of XML processing has evolved considerably. It is worth noting that the primary purpose of XML data binding APIs is to map XML to objects, and the presence of XML schemas merely helps lighten the coding effort of XML processing. In other words, if mapping XML to objects is made sufficiently simple, you not only don't need schemas, but you have strong incentive to avoid schemas because of all the issues they introduce.

As you probably have guessed by looking at the title of this section, the combination of VTD-XML and XPath is ideally suited for schema-less data binding.

Why XPath and VTD-XML?

There are three main reasons why XPath lends itself to our new approach. First, when properly written, your data binding code only needs proximate knowledge (e.g., topology, tag names, etc.) of the XML tree structure, which you can determine by looking at the XML data. XML schemas are no longer mandatory. Furthermore, XPath allows your application to bind the relevant data items and filter out everything else, avoiding wasteful object creation. Finally, the XPath-based code is easy to understand, simple to write and debug, and generally, quite maintainable.

But XPath still needs the parsed tree of XML to work. Superior to both DOM and SAX, VTD-XML offers a long list of features and benefits relevant to data binding, some of which are highlighted below.

High performance, low memory usage, and ease of use: When choosing an XML processing API, you need to consider those three factors (performance, memory consumption, and usability) as a whole. The SAX parser claims constant memory usage regardless of document size. Yet, by not exporting the hierarchical structure of XML, it is difficult to use, and doesn't even support XPath. The DOM parser builds the in-memory tree, is easier to use, and supports XPath. But, it is also very slow, and incurs exorbitant memory usage. As the next generation XML processing API based on state-of-the-art "non-extractive" tokenization technique, VTD-XML pushes the XML processing envelope to a whole new level previously thought impossible to achieve. Like DOM, VTD-XML builds an in-memory tree, and is capable of random access. But, it consumes only 1/5 the memory of DOM. Performance-wise, VTD-XML not only outperforms DOM by 5x to 12x, but also is typically twice as fast as SAX with a null content handler (the max performance). The benchmark comparison can be found here.

Non-blocking XPath implementation: VTD-XML also pioneers the incremental, non-blocking XPath evaluation. Unlike traditional XPath engines that return the entire evaluated node set all at once, VTD-XML's AutoPilot-based returns a qualified node as soon as it is evaluated, resulting in unsurpassed performance and flexibility. For further reading, please visit this link.

Native XML indexing: VTD-XML also is a native XML indexer that allows your applications to run XPath queries without parsing.

Incremental update: VTD-XML is the only XML processing API that allows you to update XML content without touching irrelevant parts of the XML document (Improve XPath Efficiency with VTD-XML), improving performance and efficiency from a different angle.

Process

The process for our new schema-less XML data binding roughly consists of the following steps.

Observe the XML document and write down the XPath expressions corresponding to the data fields of interest.

Define the class file and member variables to which those data fields are mapped.

Refactor the XPath expressions in step (1) to reduce navigation cost.

Write the XPath-based data binding routine that does the object mapping. XPath 1.0 allows XPath to be evaluated to four data types: string, boolean, double, and node set. The string type can be further converted to additional data types.

If the XML processing requires both read and write, use VTD-XML's XMLModifier to update the content of XML. Notice that you may need to record more information to take advantage of VTD-XML's incremental update capability.

A Sample Project

Let me show you how to put this new XML binding in action. This project, written in C#, follows the steps outlined above to create simple data binding routes. The first part of this project is to create read-only objects which are not modified by application logic. The second part extracts more information that allows the XML document to be updated incrementally. The last part also adds VTD+XML indexing to the mix. The XML document I use in this example looks like the following:

Read Only

Assume the application logic is driven by CD record objects between 1982 and 1990 (non-inclusive), corresponding to XPath "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]". The class definition (shown below) contains four fields, corresponding to the title, artist, price, and year of a CD.

The mapping between the object member and its corresponding XPath expression is as follows:

The "title" field corresponds to "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]/TITLE"

The "artist" field corresponds to "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]/ARTIST"

The "price" field corresponds to "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]/PRICE"

The "year" field corresponds to "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]/YEAR"

The XPath expressions can be further re-factored (for efficiency reasons) as follows:

Use "/CATALOG/CD[ YEAR < 1990 and YEAR>1982]" to navigate to the "CD" node.

Use "TITLE" to extract the "title" field (a string).

Use "ARTIST" to extract the "artist" field (a string).

Use "PRICE" to extract the "price" field (a double).

Use "YEAR" to extract the "year" field (an integer).

The code (shown below) has two methods: The "bind()" accepts the XML file name as the input, performs the data binding by plugging in the above XPath expressions, and returns an array list containing object references. The "main()" method invokes the binding routine and prints out the contents of the objects.

Read and Write

This part of the project deals with the same XML document, but the application logic lowers the "price" of each CD by 1. To that end, two changes are necessary: the CD record class file now has a new field (named priceIndex) containing the VTD index for the text node of "PRICE".

Read, Write, and Indexing

With the introduction of the native XML indexing feature since VTD-XML 2.0, you don't even have to parse the XML. If "catalog.xml" is pre-indexed, just load it up, and immediately, you can let the binding routine go to work! The main() can be quickly rewritten as follows, to entirely bypass parsing:

Benefits

Adopting this new XML binding instantly turbo-charges your XML applications. Whether it is parsing, indexing, incremental update, or non-blocking XPath, or avoiding needless object creation, VTD-XML not only does so many things well, in many situations, VTD-XML is the only tool that at all meets the performance requirements for your next SOA project.

And, you also waved goodbye to all the schema related drawbacks. Your XML/SOA applications become far less likely to break, and much more resilient to changes.

In other words, it is official: the performance issue of XML is no more. Welcome to the age of SOA!

Conclusion

Hope this article has helped you understand the process and benefits of the new XML data binding. Fundamentally speaking, XML schema is a means to an end, not an end in itself. As you probably have seen, when XML processing is made simple enough, XML schemas are mostly a bad thing for data binding. Why create all those objects that are never used? Finding a good solution once again requires that we change the problem first. Your SOA success starts with the first step, at the foundation. By combining XPath with VTD-XML, your applications can now break free of XML schema, and achieve unrivaled efficiency and agility.

Share

About the Author

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.