Index XML Documents with VTD-XML

Traditionally DOM or SAX-based enterprise applications have to repeat CPU-intensive XML parsing when accessing the same documents multiple times. VTD-XML 2.0 introduces a simple general-purpose XML index called VTD+XML (http://vtd-xml.sourceforge.net/persistence.html) that eliminates the need for repetitive parsing of those applications.

This article combines various examples and the latest benchmark reports to show you how to get started with this indexing. This article also discusses various scenarios and use cases where you may find VTD+XML useful.

Avoid Repetitive XML Parsing with VTD-XMLAs discussed in "Simplify XML processing with VTD-XML," to date one of underlying assumptions in XML application development is that an XML document must be parsed before anything else can be done with it. In other words, the processing logic of XML applications can't start without parsing. Frequently considered a threat to database performance, XML parsing is usually many times slower than other XML operations such as XPath evaluation. When those applications perform multiple read-only access to XML data that don't change very often, wouldn't it be nice to able to eliminate the overhead of associated repetitive parsing?

With the native XML indexing feature introduced in version 2.0 of VTD-XML, you can do precisely that. VTDGen, the class encapsulating various parsing routines, now adds "readIndex(...)" and "writeIndex(...)." VTD-XML 2.0 also introduces two new exceptions: indexWriteException and indexReadException.

Let me put those new methods into action and show you how to turn on the indexing capability in your application. Consider the following XML document:

Below is a simple pre-2.0 VTD-XML code named "printPrice.java" that prints out the content of the element "USPrice." Notice that it parses the XML file and then uses XPath to filter out the target nodes.

A few changes are needed to add VTD-XML's new indexing capability to the Java code above. First, you need to read in the XML document, parse it, and then write out the indexed version of the same XML document. From that point onward, your application can run XPath query or processing logic directly on top of the index, saving the CPU cycles of parsing the XML document again. The following code snippets (named "genIndex.java" and "accessIndex.java" respectively) show you how to generate and access the index. Notice that, when executed sequentially, both applications produce the identical output as "printPrice.java."The first application (genIndex.java) reads in "po.xml" and produces "po.vxl."

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

SOA is "Service-Oriented Architecture" and WOA, a more recent concept, is "Web-Oriented Architecture." Through WOA, EA & Governance practices are being used these days to ensure business alignment, proper sharing and quality while also using lightweight protocols to drive barrier-free consumption and composition.

Cloud Expo

Cloud Computing & All That
It Touches In One Location Cloud Computing - Big Data - Internet of Things
SDDC - WebRTC - DevOps
Cloud computing is become a norm within enterprise IT.

The competition among public cloud providers is red hot, private cloud continues to grab increasing shares of IT budgets, and hybrid cloud strategies are beginning to conquer the enterprise IT world.

Big Data is driving dramatic leaps in resource requirements and capabilities, and now the Internet of Things promises an exponential leap in the size of the Internet and Worldwide Web.

The world of SDX now encompasses Software-Defined Data Centers (SDDCs) as the technology world prepares for the Zettabyte Age.

Add the key topics of WebRTC and DevOps into the mix, and you have three days of pure cloud computing that you simply cannot miss.

Delegates will leave Cloud Expo with dramatically increased understanding the entire scope of the entire cloud computing spectrum from storage to security.

Cloud Expo - the world's most established event - offers a vast selection of 130+ technical and strategic Industry Keynotes, General Sessions, Breakout Sessions, and signature Power Panels. The exhibition floor features 100+ exhibitors offering specific solutions and comprehensive strategies. The floor also features two Demo Theaters that give delegates the opportunity to get even closer to the technology they want to see and the people who offer it.

Attend Cloud Expo. Craft your own custom experience. Learn the latest from the world's best technologists. Find the vendors you want and put them to the test.