SOA Product Review: Intel XML Software Suite 1.1

The one thing that unifies the distributed computing style known as SOA, in most of its manifestations, is self-describing data via the Extensible Markup Language (XML). The benefits of XML over opaque message formats in data interchange are well established. No matter if your focus is SOAP, REST, POX, or syndication with RSS or ATOM, your applications will revolve around XML processing. The bane of XML has always been the overhead of processing it in terms of memory and CPU consumption - parsing documents, performing XML Schema validation, searching for elements with XPath, and especially executing transforms. This problem has been met head-on by Intel's Software and Services Group, with the release of the Intel® XML Software Suite. The fact that Intel has a software development group dedicated to creating software tools optimized for Intel hardware platforms is not surprising or new information to folks doing software development for multi-core systems. What is surprising is the level of optimization that has been achieved in this XML toolkit.

The BasicsThe Intel XML Software Suite includes both Java and C/C++ libraries for Windows (Vista, XP, Server 2003, Server 2008) and Linux (Red Hat AS/ES, SuSE Server 9/10) on IA-32 and 64-bit Intel processors. A recent update of the Intel XML Software Suite also supports the Intel Itanium® platforms on HP-UX* OS. Performance is optimized for use on multi-core Xeon processors. Compatible Java JDKs include Sun, JRockit, and IBM. The product is not free - you will need a license to use it, but you can get started with a free evaluation license.

The Intel XML Software Suite is comprised of four separate XML processing functions bundled as a single product:

XML Parsing (DOM and SAX)

XML Schema Validation

XPath (XML navigation and expression handling)

XSL Transformation (XSLT)

The TechnologyUnderpinning each XML processing function of the product is a custom, highly optimized, XML pull parser. Just a word about XML parsers - XML parsing has traditionally followed one of two approaches: creation of a DOM tree in memory from the XML document, and event processing where the XML document is translated into a series of elements and fed to a consuming application. DOM suffers from high memory use that limits its ability to handle large documents, while event generation adds processing overhead when parsers need to make more than a single pass through the document. SAX parsers follow a blind event generation paradigm where the consuming application has no control over the event sequence, while pull parsers allow consuming applications to control the traversal of documents to generate events. The Intel XML Software Suite is built around the custom pull parser and shared data structures - DOM and SAX parsers are actually applications built on top of the pull parser.

But the Intel XML Software Suite derives its true power from its deep integration with the Intel processor architectures for which it is optimized. It makes great sense if you think about it - who could be more qualified to take on common, compute-intensive, processing problems for Intel processor architectures than Intel? What they did is build a set of native libraries optimized for dual-core Xeon processors that can alternatively be linked with C/C++ programs or surfaced in Java as a native library. Anyone familiar with native code optimization will understand the value of a good compiler. To this end, the product is built with the Intel Compiler. Since XML processing is conducive to multithreading, Intel turned loose their multithreading performance analysis kit to achieve further optimization. The user has the ability to configure the number of threads at runtime. Not surprisingly, Intel reports that the best performance realized in their testing occurred when the number of threads equaled the number of processor cores. The Intel XML Software Suite is poised to incorporate upcoming Intel Streaming SIMD Extensions 4 (SSE4) instructions to further boost XML processing performance. By including the SSE4.2 instructions inside the Intel XML Software Suite, developers will be able to take advantage of these new instructions without changing their application code; you simply need to be using the latest XML library from Intel. Intel makes it easy for developers to benefit from new Intel CPU instructions by incorporating them into runtime libraries this way.

Standards ComplianceXML is at the center of a standards ecosystem that underpins much of modern data interchange. Understanding this, Intel has baked deep and wide standards compliance into XSS. First, there is support for 45 character sets, including UTF-8, UTF-16, and EBCDIC, and other encodings that support internationalization. Intel claims 98.0% conformance with W3C and OASIS Test Suites across key industry standards including:

W3C XML 1.0

W3C XML Schema 1.0

W3C XPath 1.0

W3C XSLT 1.0

W3C Namespaces in XML 1.0 Recommendation

W3C DOM level 2 core and partial level 3

SAX 2.0 core and extensions

JAXP 1.3 and 1.4

UsageIntel has made using the product a snap. You do have to be a code developer but the process of using the Intel XML Software Suite in your Java and/or C++ projects is all laid out for you. It's simple to install the product, set up your environment, and then leverage it inside of an IDE. <oXygen/>® XML Editor, an XML development environment, has integrated the Intel XML Software Suite and fully supports it within their IDE. The Intel XML Software Suite user's guides include instructions for using the tools inside of Eclipse and application servers. The Intel XMLSoftware Suite is configured via (you guessed it) an XML configuration file that is read once at startup. Configurable parameters include maximum memory consumed across all of the product at runtime, and the number of threads to be used in processing XSLT transforms. One interesting aspect for Java developers is Intel XML Software Suite's use of native memory instead of JVM memory. This should prove a help in reducing garbage collection overhead, which can be experienced in a pure Java XML toolkit like Apache Xalan/Xerces. At the same time, as a native library, Intel XML Software Suite has the capability to cause the JVM to core dump, so pay close attention to the Java JDK compatibility section of the Java User's Guide and architect your application for this possibility. Code samples are provided for each of the four tools, providing good coverage of the core functionality of each. For the XSLT Accelerator there is a detailed section on XSLT extension functions and elements. The product provides extensions as defined by the EXLT Community Project, including date, math, node set manipulation functions, among others. There is also support for custom extension functions.

PerformanceGetting to the meat of the matter, Intel has published some rather compelling performance metrics for their product across a set of common XML processing scenarios. The performance tests compared the Intel XML Software Suite for Java against XercesJ 2.9.0 (parsing and schema validation) and Xalan 2.7.0 (XPath and XSLT). The test platform leveraged two quad-core Xeon processors running at 2.66 GHz, 16 GB RAM, and ran SUN JDK 1.6.0_02 on a SuSE Linux Enterprise Server. The test methodology centered on having a number of threads iterate over a shared pool of workload tasks after a warm up period in which each thread is allowed to set up the structures needed for the test, e.g., load documents and classes, and pre-compile schemas and expressions. The workload metric varies across each of the test scenarios, e.g., throughput in MB/sec, XPath expressions/sec, or Transformations/sec. The following performance results are published by Intel. Intel recently has also provided a free XML Benchmark Tool to help developers measure and compare different XML processing engines.

SAX ParsingThe SAX parsing test was run against 22 XML documents of varying sizes, some as large as 200 KB, drawn from a wide variety of sources so as to be as representative as possible.

# of threads

Intel XSS throughput (MB/s)

Xerces 2.9.0 throughput (MB/s)

1

89

40

2

156

90

4

300

158

8

553

225

Schema ValidationThe schema validation test was run against a set of six XML Schemas and associated sample XML documents drawn from different public sources so as to be representative of common schema validation.

# of threads

Intel XSS throughput (MB/s)

Xerces 2.9.0 throughput (MB/s)

1

69

17

2

141

25

4

276

36

8

507

54

XPathThe XPath test was run against the XPathMark test suite, which encompasses a wide variety of queries resulting in a wide range of returned nodes.

# of threads

Intel XSS eval time (ms)

Xalan 2.7.0 eval time (ms)

1

767

3127

2

429

1640

4

244

947

8

151

698

XSLTA wide variety of publicly available stylesheets were used, which represent a common set of transform scenarios, including XML-to-HTML.

# of threads

Intel XSS throughput (trans/s)

Xalan 2.7.0 throughput (trans/s)

1

1628

278

2

2790

510

4

4794

732

8

6961

724

ConclusionIf you are an SOA architect who is concerned about the effect of XML processing on the performance of your systems, take a look at Intel XML Software Suite. It offers superior performance for XML processing in Java and C/C++, especially if you can run it on dual-core or multi-core Intel Xeon processors. You will find it has excellent standards compliance, excellent documentation and examples, and even a burgeoning user community.With the importance of XML and XML processing to data interchange, it is great to see the Software and Services Group at Intel address this issue head-on.