XML Parser for C++ Specifications

Oracle provides a set of XML parsers for Java, C, C++, and PL/SQL. Each of these parsers is a stand-alone XML component that parses an XML document (or a standalone DTD) so that it can be processed by an application. Library and command-line versions are provided supporting the following standards and features:

DOM (Document Object Model) support is provided compliant with the W3C DOM 1.0 Recommendation. These APIs permit applications to access and manipulate an XML document as a tree structure in memory. This interface is used by such applications as editors.

SAX (Simple API for XML) support is also provided compliant with the SAX 1.0 specification. These APIs permit an application to process XML documents using an event-driven model.

Support is also included for W3C recommendation for XML Namespaces 1.0 thereby avoiding name collisions, increasing reusability and easing application integration.

Supports validation and non-validation modes

Supports W3C XML 1.0 Recommendation

Integrated support for W3C XSLT 1.0 Recommendation

Validating and Non-Validating Mode Support

The XML Parser for C++ can parse XML in validating or non-validating modes.

In non-validating mode, the parser verifies that the XML is well-formed and parses the data into a tree of objects that can be manipulated by the DOM API.

In validating mode, the parser verifies that the XML is well-formed and validates the XML data against the DTD (if any).

Validation involves checking whether or not the attribute names and element tags are legal, whether nested elements belong where they are, and so on.

Example Code

Online Documentation

Documentation for Oracle XML Parser for C++ is located in the $ORACLE_HOME/xdk/cpp/parser/doc directory.

Release Specific Notes

The readme.html file in the root directory of the archive contains release specific information including bug fixes, API additions, and so on.

The Oracle XML parser for C++ is written in C with C++ wrappers. It will check if an XML document is well-formed, and optionally validate it against a DTD. The parser will construct an object tree which can be accessed via a DOM interface or operate serially via a SAX interface.

Default:

The default encoding is UTF-8. It is recommended that you set the default encoding explicitly if using only single byte character sets (such as US-ASCII or any of the ISO-8859 character sets) for performance up to 25% faster than with multibyte character sets, such as UTF-8.

Validation warnings: Validity Constraint (VC) errors have been changed to warnings and do not terminate parsing. For compatibility with the old behavior (halt on warnings as well as errors), a new flag XML_FLAG_STOP_ON_WARNING (or '-W' to the xml program) has been added.

HTTP support: HTTP URIs are now supported; look for FTP in the next release. For other access methods, the user may define their own callbacks with the new xmlaccess() API.

Oracle XML Parser 2.0.2.0.0 (C++)

XSLT improvements: Various bugs fixed in the XSLT processor; error messages are improved; xsl:number, xsl:sort, xsl:namespace-alias, xsl:decimal-format, forwards-compatible processing with xsl:version, and literal result element as stylesheet are now available; the following XSLT-specific additions to the core XPath library are now available: current(), format-number(), generate-id(), and system-property().

XML parser bug fixes: Some problems with validation and matching of start and end tags with SAX were fixed. Also, a bug with parameter entity processing in external entities was fixed.

Oracle XML Parser 2.0.1.0.0 (C++)

Performance improvements: Major performance improvement over the last, about two and a half times faster for UTF-8 parsing and about four times faster for ASCII parsing. Comparison timing against previous version for parsing (DOM) and validating various standalone files (SPARC Ultra 1 CPU time):

Lists, not arrays: Internal parser data structures are now uniformly lists; arrays have been dropped. Therefore, access is now better suited to a firstChild/nextSibling style loop instead of numChildNodes/getChildNode. DTD parsing:A new API call xmlparsedtd() is added which parses an external DTD directly, without needing an enclosing document. Used mainly by the Class Generator.

Error reporting: Error messages are improved and more specific, with nearly twice as many as before. Error location is now described by a stack of line number/entity pairs, showing the final location of the error and intermediate inclusions (e.g. line X of file, line Y of entity).

NOTE: You must use the new error message file (lpxus.msb) provided with this release; the error message file provided with earlier releases is incompatible. See below.

XSL improvements: Various bugs fixed in the XSLT processor; xsl:call-template is now fully supported.

Oracle XML Parser 2.0.1.0.0 (C++)

Performance improvements: Major performance improvement over the last, about two and a half times faster for UTF-8 parsing and about four times faster for ASCII parsing. Comparison timing against previous version for parsing (DOM) and validating various standalone files (SPARC Ultra 1 CPU time):File sizeOld UTF-8New UTF-8SpeedupOld ASCIINew ASCIISpeedup42K180ms70ms2.6120ms40ms3.0134K510ms210ms2.4450ms100ms4.5247K980ms400ms2.5690ms180ms3.81M2860ms1130ms2.51820ms380ms4.82.7M10550ms4100ms2.67450ms1930ms3.910.5M42250ms16400ms2.629900ms7800ms3.8

Lists, not arrays: Internal parser data structures are now uniformly lists; arrays have been dropped. Therefore, access is now better suited to a firstChild/nextSibling style loop instead of numChildNodes/item.

DTD parsing:A new method XMLParser::xmlparseDTD() is added which parses an external DTD directly, without needing an enclosing document. Used mainly by the Class Generator.

Error reporting: Error messages are improved and more specific, with nearly twice as many as before. Error location is now described by a stack of line number/entity pairs, showing the final location of the error and intermediate inclusions (e.g. line X of file, line Y of entity).

NOTE: Use the new error message file (lpxus.msb) provided with this release; the error message file provided with earlier releases is incompatible. See below.

XSL improvements: Various bugs fixed in the XSLT processor; xsl:call-template is now fully supported.

Oracle XML Parser 2.0.0.0.0 (C++)

The Oracle XML v2 parser is a beta release and is written in C, with a C++ wrapper. The main difference from the Oracle XML v1 parser is the ability to format the XML document according to a stylesheet via an integrated an XSLT processor. The XML parser will check if an XML document is well-formed, and optionally validate it against a DTD. The parser will construct an object tree which can be accessed via a DOM interface or operate serially via a SAX interface.

XML Parser for C++: XMLParser() API

Table F-2 lists the main XML Parser for C++,class XMLParser() methods with a brief description of each. XMLParser() class contains top-level methods that do the following:

XML Parser for C++: DOM API

Table F-3 XML Parser for C++: DOM API Classes (SubClasses)

This class contains methods for accessing the name and value of a single document node attribute.

getName

Return name of attribute

getValue

Return "value" (definition) of attribute

getSpecified

Return attribute's "specified" flag value

setValue

Set an attribute's value

CDATASection (Text)

This class implements the CDATA node type, a subclass of Text. There are no methods.

CharacterData (Node)

This class contains methods for accessing and modifying the data associated with text nodes.

appendData

Append a string to this node's data

deleteData

Remove a substring from this node's data

getData

Get data (value) of a text node

getLength

Return length of a text node's data

insertData

nsert a string into this node's data

replaceData

Replace a substring in this node's data

substringData

Fetch a substring of this node's data

Comment (CharacterData)

This class implements the COMMENT node type, a subclass of CharacterData. There are no methods.

Document (Node)

This class contains methods for creating and retrieving nodes.

createAttribute

Create an ATTRIBUTE node

createCDATASection

Create a CDATA node

createComment

Create a COMMENT node

createDocumentFragment

Create a DOCUMENT_FRAGMENT node

createElement

Create an ELEMENT node

createEntityReference

Create an ENTITY_REFERENCE node

createProcessingInstruction

Create a PROCESSING_INSTRUCTION node

createTextNode

Create a TEXT node

getElementsByTagName

Select nodes based on tag name

getImplementation

Return DTD for document

DocumentFragment (Node)

This class implements the DOCUMENT_FRAGMENT node type, a subclass of Node.

DocumentType (Node)

This class contains methods for accessing information about the Document Type Definition (DTD) of a document.

getName

R eturn name of DTD

getEntities

Return NamedNodeMap of DTD's (general) entities

getNotations

Return NamedNodeMap of DTD's notations

DOMImplementation

This class contains methods relating to the specific DOM implementation supported by the parser.

hasFeature

Detect if the named feature is supported

Element (Node

This class contains methods pertaining to element nodes.

getTagName

Return the node's tag name

getAttribute

Select an attribute given its name

setAttribute

Create a new attribute given its name and value

removeAttribute

Remove an attribute given its name

getAttributeNode

Remove an attribute given its name

setAttributeNode

Add a new attribute node

removeAttributeNode

Remove an attribute node

getElementsByTagName

Return a list of element nodes with the given tag name

normalize

"Normalize" an element (merge adjacent text nodes)

Entity (Node)

This class implements the ENTITY node type, a subclass of Node.

getNotation

NameReturn entity's NDATA (notation name)

getPublicId

Return entity's public ID

getSystemId

Return entity's system ID

EntityReference (Node)

This class implements the ENTITY_REFERENCE node type, a subclass of Node.

NamedNodeMap

This class contains methods for accessing the number of nodes in a node map and fetching individual nodes.

item

Return nth node in map

getLength

Return number of nodes in map

getNamedItem

Select a node by name

setNamedItem

Set a node into the map

getLength

Remove the named node from map

Node

This class contains methods for details about a document node

appendChild

Append a new child to the end of the current node's list of children

cloneNode

Clone an existing node and optionally all its children

getAttributes

Return structure contains all defined node attributes

getChildNode

Return specific indexed child of given node

getChildNodes

Return structure contains all child nodes of given node

getFirstChild

Return first child of given node

getLastChild

Return last child of given node

getLocal

Returns the local name of the node

getNamespace

Return a node's namespace

getNextSibling

Return a node's next sibling

getName

Return name of node

getType

Return numeric type-code of node

getValue

Return "value" (data) of node

getOwnerDocument

Return document node which contains a node

getParentNode

Return parent node of given node

getPrefix

Returns the namespace prefix for the node

getPreviousSibling

Returns the previous sibling of the current node

getQualifiedName

Return namespace qualified node of given node

hasAttributes

Determine if node has any defined attributes

hasChildNodes

Determine if node has children

insertBefore

Insert new child node into a node's list of children

numChildNodes

Return count of number of child nodes of given node

removeChild

Remove a node from the current node's list of children

replaceChild

Replace a child node with another

setValue

Sets a node's value (data)

NodeList

This class contains methods for extracting nodes from a NodeList

item

Return nth node in list

getLength

Return number of nodes in list

Notation (Node)

This class implements the NOTATION node type, a subclass of Node.

getData

Return notation's data

getTarget

Return notation's target

setData

Set notation's data

ProcessingInstruction (Node)

This class implements the PROCESSING_INSTRUCTION node type, a subclass of Node.

getData

Return the PI's data

getTarget

Return the PI's target

setData

Set the PI's data

Text (CharacterData)

This class contains methods for accessing and modifying the data associated with text nodes (subclasses CharacterData).

splitText

Get data (value) of a text node

XML Parser for C++: XSLT API

XSLT is a language for tranforming XML documents into other XML documents. It is designed for use as part of XSL, which is a stylesheet language for XML. In addition to XSLT, XSL includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary.

XSLT is also designed to be used independently of XSL. However, XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformation that are needed when XSLT is used as part of XSL.

A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in the source tree. A template is instantiated to create part of the result tree. The result tree is separate from the source tree. The structure of the result tree can be completely different from the structure of the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and arbitrary structure can be added.

Stylesheets

A transformation expressed in XSLT is called a stylesheet. This is because, in the case when XSLT is transforming into the XSL formatting vocabulary, the transformation functions as a stylesheet.

A stylesheet contains a set of template rules. A template rule has two parts:

A pattern which is matched against nodes in the source tree

A template which can be instantiated to form part of the result tree. his allows a stylesheet to be applicable to a wide class of documents that have similar source tree structures.

How StylesheetTemplates are Processed

A template is instantiated for a particular source element to create part of the result tree. A template can contain elements that specify literal result element structure. A template can also contain elements from the XSLT namespace that are instructions for creating result tree fragments. When a template is instantiated, each instruction is executed and replaced by the result tree fragment that it creates.

Instructions can select and process descendant source elements. Processing a descendant element creates a result tree fragment by finding the applicable template rule and instantiating its template. Note that elements are only processed when they have been selected by the execution of an instruction. The result tree is constructed by finding the template rule for the root node and instantiating its template.

A software module called an XSL processor reads XML documents and transforms them into other XML documents with different styles.

XML Parser for C++ implementation of the XSL processor follows the XSL Transformations standard (version 1.0, November 16, 1999) and includes the required behavior of an XSL processor as specified in the XSLT specification.

XML Parser for C++: SAX API

The SAX API is based on callbacks. Instead of the entire document being parsed and turned into a data structure which may be referenced (by the DOM interface), the SAX interface is serial. As the document is processed, appropriate SAX user callback functions are invoked. Each callback function returns an error code, zero meaning success, any non-zero value meaning failure. If a non-zero code is returned, document processing is stopped.

To use SAX, an xmlsaxcb structure is initialized with function pointers and passed to the xmlinit() call. A pointer to a user-defined context structure may also be included; that context pointer will be passed to each SAX function.

XML C++ Class Generator Specifications

Working in conjunction with the XML Parser for C++, the XML Class Generator generates a set of C++ source files based on an input DTD. The generated C++ source files can then be used to construct, optionally validate, and print a XML document that is compliant to the DTD specified. The Class Generator supports validation mode to assist debugging.

Input to the XML C++ Class Generator

Input is an XML document containing a DTD. The document body itself is ignored; only the DTD is relevant, though the dummy document must conform to the DTD. The underlying XML parser only accepts file names for the document and associated external entities. In future releases, no dummy document will be required, and URIs for additional protocols will be accepted.

Default:

The default encoding is UTF-8. It is recommended that you set the default encoding explicitly if using only single byte character sets (such as US-ASCII or any of the ISO-8859 character sets) for performance up to 25% faster than with multibyte character sets, such as UTF-8.

Output to XML C++ Class Generator

XML Parser for C++ output is a pair of C++ source files, .cpp and .h, named after the DTD. Constructors are provided for each class (element) that allow an object to be created in two different ways: initially empty, then adding the children or data after the initial creation, or created with the initial full set of children or initial data. A method is provided for #PCDATA (and Mixed) elements to set the data and, when appropriate, set an element's attributes.

Standards Conformance

XML C++ Class Generator conforms to the following "Standards":

The W3C recommendation for Extensible Markup Language (XML) 1.0

The W3C recommendation for Document Object Model Level 1 1.0

The W3C proposed recommendation for Namespaces in XML

The Simple API for XML (SAX) 1.0

Directory Structure

The XML C++ Class Generator has the following file and directory structure: