Introduction to the XML Pipeline Processor

Prerequisites

This chapter assumes that you are familiar with the following topics:

XML Pipeline Definition Language. This XML vocabulary enables you to describe the processing relations between XML resources. If you require a more thorough introduction to the Pipeline Definition Language, consult the XML resources listed in "Related Documents" of the preface.

Standards and Specifications

The Oracle XML Pipeline processor is based on the W3C XML Pipeline Definition Language Version 1.0 Note. The W3C Note defines an XML vocabulary rather than an API. You can find the Pipeline specification at the following URL:

Multistage XML Processing

The Oracle XML Pipeline processor is built on the XML Pipeline Definition Language. The processor can take an input XML pipeline document and execute pipeline processes according to derived dependencies. A pipeline document, which is written in XML, specifies the processes to be executed in a declarative manner. You can associate Java classes with processes by using the <processdef/> element in the pipeline document.

Use the Pipeline processor for mutistage processing, which occurs when you process XML components sequentially or in parallel. The output of one stage of processing can become the input of another stage of processing. You can write a pipeline document that defines the inputs and outputs of the processes. Figure 7-1 illustrates a possible pipeline sequence.

Figure 7-1 Pipeline Processing

In addition to the XML Pipeline processor itself, the XDK provides an API for processes that you can pipe together in a pipeline document. Table 7-2 summarizes the classes provided in the oracle.xml.pipeline.processes package.

The typical stages of processing XML in a pipeline are as follows:

Parse the input XML documents. The oracle.xml.pipeline.processes package includes DOMParserProcess for DOM parsing and SAXParserProcess for SAX parsing.

Validate the input XML documents.

Serialize or transform the input documents. Note that the Pipeline processor does not enable you to connect the SAX parser to the XSLT processor, which requires a DOM.

In multistage processing, SAX is ideal for filtering and searching large XML documents. You should use DOM when you need to change XML content or require efficient dynamic access to the content.

Customized Pipeline Processes

The oracle.xml.pipeline.controller.Process class is the base class for all pipeline process definitions. The classes in the oracle.xml.pipeline.processes package extend this base class. To create a customized pipeline process, you need to create a class that extends the Process class.

At the minimum, every custom process should override the do-nothing initialize() and execute() methods of the Process class. If the customized process accepts SAX events as input, then it should override the SAXContentHandler() method to return the appropriate ContentHandler that handles incoming SAX events. It should also override the SAXErrorHandler() method to return the appropriate ErrorHandler. Table 7-1 provides further descriptions of the preceding methods.

Table 7-1 Methods in the oracle.xml.pipeline.controller.Process Class

Class

Description

initialize()

Initializes the process before execution.

Call getInput() to fetch a specific input object associated with the process element and call supportType() to indicate the types of input supported. Analogously, call getOutput() and supportType() for output.

execute()

Executes the process.

Call getInParaValue(), getInput(), or getInputSource() to fetch the inputs to the process. If a custom process outputs SAX events, then it should call the getSAXContentHandler() and getSAXErrorHandler() methods in execute() to get the SAX handlers of the following processes in the pipeline.

Call setOutputResult(), getOutputStream(), getOutputWriter() or setOutParam() to set the outputs or outparams generated by this process.

Call getErrorSource(), getErrorStream(), or getErrorDocument() to access the pipeline error element associated with this process element. If an exception occurs during execute(), call error() or info() to propagate it to the PipelineErrorHandler.

SAXContentHandler()

Returns the SAX ContentHandler.

If dependencies from other processes are not available at this time, then return null. When these dependencies are available the method will be executed till the end.

SAXErrorHandler()

Returns the SAX ErrorHandler.

If you do not override this method, then the JAXB processor uses the default error handler implemented by this class to handle SAX errors.

Using the XML Pipeline Processor: Basic Process

The XML Pipeline processor is accessible through the following packages:

oracle.xml.pipeline.controller, which provides an XML Pipeline controller that executes XML processes in a pipeline based on dependencies.

oracle.xml.pipeline.processes, which provides wrapper classes for XML processes that can be executed by the XML Pipeline controller. The oracle.xml.pipeline.processes package contains the classes that you can use to design a pipeline application framework. Each class extends the oracle.xml.pipeline.controller.Process class.

Table 7-2 lists the components in the package. You can connect these components and processes through a combination of the XML Pipeline processor and a pipeline document.

Table 7-2 Classes in oracle.xml.pipeline.processes

Class

Description

CompressReaderProcess

Receives compressed XML and outputs parsed XML.

CompressWriterProcess

Receives XML parsed with DOM or SAX and outputs compressed XML.

DOMParserProcess

Parses incoming XML and outputs a DOM tree.

SAXParserProcess

Parses incoming XML and outputs SAX events.

XPathProcess

Accepts a DOM as input, uses an XPath pattern to select one or more nodes from an XML Document or an XML DocumentFragment, and outputs a Document or DocumentFragment.

XSDSchemaBuilder

Parses an XML schema and outputs a schema object for validation. This process is built into the XML Pipeline processor and builds schema objects used for validating XML documents.

XSDValProcess

Validates against a local schema, analyzes the results, and reports errors if necessary.

XSLProcess

Accepts DOM as input, applies an XSL stylesheet, and outputs the result of the transformation.

XSLStylesheetProcess

Receives an XSL stylesheet as a stream or DOM and creates an XSLStylesheet object.

Figure 7-2 illustrates how to pass a pipeline document to a Java application that uses the XML Pipeline processor, configure the processor, and execute the pipeline.

Figure 7-2 Using the Pipeline Processor for Java

The basic steps are as follows:

Instantiate a pipeline document, which forms the input to the pipeline execution. Create the object by passing a FileReader to the constructor as follows:

Instantiate a pipeline processor. PipelineProcessor is the top-level class that executes the pipeline. Table 7-3 describes some of the available methods.

Table 7-3 PipelineProcessor Methods

Method

Description

executePipeline()

Executes the pipeline based on the PipelineDoc set by invoking setPipelineDoc().

getExecutionMode()

Gets the type of execution mode: PIPELINE_SEQUENTIAL or PIPELINE_PARALLEL.

setErrorHandler()

Sets the error handler for the pipeline. This call is mandatory to execute the pipeline.

setExecutionMode()

Sets the execution mode. PIPELINE_PARALLEL is the default and specifies that the processes in the pipeline should execute in parallel. PIPELINE_SEQUENTIAL specifies that the processes in the pipeline should execute sequentially.

setForce()

Sets execution behavior. If TRUE, then the pipeline executes regardless of whether the target is up-to-date with respect to the pipeline inputs.

setPipelineDoc()

Sets the PipelineDoc object for the pipeline.

The following statement instantiates the pipeline processor:

proc = new PipelineProcessor();

Set the processor to the pipeline document. For example:

proc.setPipelineDoc(pipe);

Set the execution mode for the processor and perform any other needed configuration. For example, set the mode by passing a constant to PipelineProcessor.setExecutionMode().

The following statement specifies sequential execution:

proc.setExecutionMode(PipelineConstants.PIPELINE_SEQUENTIAL);

Instantiate an error handler. The error handler must implement the PipelineErrorHandler interface. For example:

errHandler = new PipelineSampleErrHdlr(logname);

Set the error handler for the processor by invoking setErrorHandler(). For example:

Running the XML Pipeline Processor Demo Programs

Demo programs for the XML Pipeline processor are included in $ORACLE_HOME/xdk/demo/java/pipeline. Table 7-4 describes the XML files and Java source files that you can use to test the utility.

Table 7-4 Pipeline Processor Sample Files

File

Description

README

A text file that describes how to set up the Pipeline processor demos.

PipelineSample.java

A sample Pipeline processor application. The program takes pipedoc.xml as its first argument.

PipelineSampleErrHdlr.java

A sample program to create an error handler used by PipelineSample.

book.xml

A sample XML document that describes a series of books. This document is specified as an input by pipedoc.xml, pipedoc2.xml, and pipedocerr.xml.

book.xsl

An XSLT stylesheet that transforms the list of books in book.xml into an HTML table.

book_err.xsl

An XSLT stylesheet specified as an input by the pipedocerr.xml pipeline document. This stylesheet contains an intentional error.

id.xsl

An XSLT stylesheet specified as an input by the pipedoc3.xml pipeline document.

items.xsd

An XML schema document specified as an input by the pipedoc3.xml pipeline document.

pipedoc.xml

A pipeline document. This document specifies that process p1 should parse book.xml with DOM, process p2 should parse book.xsl and create a stylesheet object, and process p3 should apply the stylesheet to the DOM to generate myresult.html.

pipedoc2.xml

A pipeline document. This document specifies that process p1 should parse book.xml with SAX, process p2 should generate compressed XML compxml from the SAX events, and process p3 should regenerate the XML from the compressed stream as myresult2.html.

pipedoc3.xml

A pipeline document. This document specifies that a process p5 should parse po.xml with DOM, process p1 should select a single node from the DOM tree with an XPath expression, process p4 should parse items.xsd and generate a schema object, process p6 should validate the selected node against the schema, process p3 should parse id.xsl and generate a stylesheet object, and validated node to produce myresult3.html.

pipedocerr.xml

A pipeline document. This document specifies that process p1 should parse book.xml with DOM, process p2 should parse book_err.xsl and generate a stylesheet object if it encounters no errors and apply an inline stylesheet if it encounters errors, and process p3 should apply the stylesheet to the DOM to generate myresulterr.html. Because book_err.xsl contains an error, the program should write the text contents of the input XML to myresulterr.html.

po.xml

A sample XML document that describes a purchase order. This document is specified as an input by pipedoc3.xml.

Documentation for how to compile and run the sample programs is located in the README. The basic steps are as follows:

Change into the $ORACLE_HOME/xdk/demo/java/pipeline directory (UNIX) or %ORACLE_HOME%\xdk\demo\java\pipeline directory (Windows).

Run make (UNIX) or Make.bat (Windows) at the system prompt to generate class files for PipelineSample.java and PipelineSampleErrHdler.java and run the demo programs. The programs write output files to the log subdirectory.

Alternatively, you can run the demo programs manually by using the following syntax:

java PipelineSample pipedocpipelog [ seq | para ]

The pipedoc option specifies which pipeline document to use. The pipelog option specifies the name of the pipeline log file, which is optional unless you specify seq or para, in which case a filename is required. If you do not specify a log file, then the program generates pipeline.log by default. The seq option processes threads sequentially; para processes in parallel. If you specify neither seq or para, then the default is parallel processing.

View the files generated from the pipeline, which are all named with the initial string myresult, and the log files.

Using the XML Pipeline Processor Command-Line Utility

The command-line interface for the XML Pipeline processor is named orapipe. The Pipeline processor is packaged with Oracle database. By default, the Oracle Universal Installer installs the utility on disk in $ORACLE_HOME/bin.

Before running the utility for the first time, make sure that your environment variables are set as described in "Setting Up the Java XDK Environment". Run orapipe at the operating system command line with the following syntax:

orapipe options pipedoc

The pipedoc is the pipeline document, which is required. Table 7-5 describes the available options for the orapipe utility.

Table 7-5 orapipe Command-Line Options

Option

Purpose

-help

Prints the help message

-loglogfile

Writes errors and messages to the specified log file. The default is pipeline.log.

-noinfo

Does not log informational items. The default is on.

-nowarning

Does not log warnings. The default is on.

-validate

Validates the input pipedoc with the pipeline schema. Validation is turned off by default. If outparam feature is used, then validate fails with the current pipeline schema because this is an additional feature.

-version

Prints the release version.

-sequential

Executes the pipeline in sequential mode. The default is parallel.

-force

Executes pipeline even if target is up-to-date. By default no force is specified.

-attrnamevalue

Sets the value of $name to the specified value. For example, if the attribute name is source and the value is book.xml, then you can pass this value to an element in the pipeline document as follows: <input ... label="$source">.

Processing XML in a Pipeline

Creating a Pipeline Document

To use the Oracle XML Pipeline processor, you must create an XML document according to the rules of the Pipeline Definition Language specified in the W3C Note.

The W3C specification defines the XML processing components and the inputs and outputs for these processes. The XML Pipeline processor includes support for the following XDK components:

XML parser

XML compressor

XML Schema validator

XSLT processor

Example of a Pipeline Document

The XML Pipeline processor executes a sequence of XML processing according to the rules in the pipeline document and returns a result. Example 7-1 shows pipedoc.xml, which is a sample pipeline document included in the demo directory.

Processes Specified in the Pipeline Document

In Example 7-1, three processes are called and associated with Java classes in the oracle.xml.pipeline.processes package. The pipeline document uses the <processdef/> element to make the following associations:

domparser.p is associated with the DOMParserProcess class

xslstylesheet.p is associated with the XSLStylesheetProcess class

xslprocess.p is associated with the XSLProcess class

Processing Architecture Specified in the Pipeline Document

The PipelineSample program accepts the pipedoc.xml document shown in Example 7-1 as input along with XML documents book.xml and book.xsl. The basic design of the pipeline is as follows:

Parse the incoming book.xml document and generate a DOM tree. This task is performed by DOMParserProcess.

Parse book.xsl as a stream and generate an XSLStylesheet object. This task is performed by XSLStylesheetProcess.

Receive the DOM of book.xml as input, apply the stylesheet object, and write the result to myresult.html. This task is performed by XSLProcess.

Note the following aspects of the processing architecture used in the pipeline document:

The target information set, http://example.org/myresult.html, is inferred from the default value of the target parameter and the xml:base setting.

The process p2 has an input of book.xsl and an output parameter with the label xslstyle, so it has to run to produce the input for p3.

The p3 process has an output parameter with the label http://example.org/myresult.html, so it has to run to produce the target.

The process p1 depends on input document book.xml and outputs xmldoc, so it has to run to produce the input for p3.

In Example 7-1, more than one order of processing can satisfy all of the dependencies. Given the rules, the XML Pipeline processor must process p3 last but can process p1 and p2 in either order or process them in parallel.

Writing a Pipeline Processor Application

The PipelineSample.java source file illustrates a basic pipeline application. You can use the application with any of the pipeline documents in Table 7-4 to parse and transform an input XML document.

The basic steps of the program are as follows:

Perform the initial setup. The program declares references of type FileReader (for the input XML file), PipelineDoc (for the input pipeline document), and PipelineProcessor (for the processor). The first argument is the pipeline document, which is required. If a second argument is received, then it is stored in the logname String. The following code fragment illustrates this technique:

Create a FileReader object by passing the first command-line argument to the constructor as the filename. For example:

f = new FileReader(args[0]);

Create a PipelineDoc object by passing the reference to the FileReader object. The following example casts the FileReader to a Reader and specifies no validation:

pipe = new PipelineDoc((Reader)f, false);

Instantiate an XML Pipeline processor. The following statement instantiates the pipeline processor:

proc = new PipelineProcessor();

Set the processor to the pipeline document. For example:

proc.setPipelineDoc(pipe);

Set the execution mode for the processor and perform any other configuration. The following code fragment uses a condition to determine the execution mode. If three or more arguments are passed to the program, then it sets the mode to sequential or parallel depending on which argument is passed. For example:

Instantiate an error handler. The error handler must implement the PipelineErrorHandler interface. The program uses the PipelineSampleErrHdler shown in PipelineSampleErrHdlr.java. The following code fragment illustrates this technique:

errHandler = new PipelineSampleErrHdlr(logname);

Set the error handler for the processor by invoking setErrorHandler(). The following statement illustrates this technique:

proc.setErrorHandler(errHandler);

Execute the pipeline. The following statement illustrates this technique:

Writing a Pipeline Error Handler

An application calling the XML Pipeline processor must implement the PipelineErrorHandler interface to handle errors received from the processor. Set the error handler in the processor by calling setErrorHandler(). When writing the error handler, you can choose to throw an exception for different types of errors.

The oracle.xml.pipeline.controller.PipelineErrorHandler interface declares the methods shown in Table 7-6, all of which return void.

Table 7-6 PipelineErrorHandler Methods

Method

Description

error(java.lang.String msg, PipelineException e)

Handles PipelineException errors.

fatalError(java.lang.String msg, PipelineException e)

Handles fatal PipelineException errors.

warning(java.lang.String msg, PipelineException e)

Handles PipelineException warnings.

info(java.lang.String msg)

Prints optional, additional information about errors.

The first three methods in Table 7-6 receive a reference to an oracle.xml.pipeline.controller.PipelineException object. The following methods of the PipelineException class are especially useful:

getExceptionType(), which obtains the type of exception thrown

getProcessId(), which obtains the process ID where the exception occurred

getMessage(), which returns the message string of this Throwable error

The PipelineSampleErrHdler.java source file implements a basic error handler for use with the PipelineSample program. The basic steps are as follows:

Implement a constructor. The constructor accepts the name of a log file and wraps it in a FileWriter object as follows:

Implement the info() method. Unlike the preceding methods, this method does not receive a PipelineException reference as input. The following implementation prints the String received by the method and increments the value of the warnCount variable: