Abstract

This specification describes the syntax and semantics of
XProc: An XML Pipeline Language, a language
for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be
performed on one or more XML documents, producing one or more XML
documents as output. Steps in the pipeline may read or write
non-XML resources as well.

Status of this Document

This section describes the status of this document at
the time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest revision
of this technical report can be found in the W3C technical reports index
at http://www.w3.org/TR/.

This document was produced by the
XML Processing Model Working Group
which is part of the
XML Activity.
Publication as a Working Draft does not imply endorsement
by the W3C Membership. This is a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in progress.

This is a first public Working Draft.Draft, based on the previously
published
Requirements
Document. Although not While some details of the design are
remain incomplete, the Working Group has chosen to publish aearly new draft
in order to show the direction we are heading and to encourage
feedback from potential users.

1 Introduction

An XML Pipeline specifies a sequence of operations to be
performed on a collection of input documents. Pipelines take zero or more
XML documents as their input and produce zero or more XML documents as
their output. Steps in the pipeline may
read or write non-XML resources as well.

A pipeline consists of components. Like
pipelines, components take zero or more XML documents as their input
and produce zero or more XML documents as their output. The inputs to
a component come from the web, from the pipeline document, from the inputs to the
pipeline itself, or from the outputs of other components in the pipeline. The outputs
from a component are consumed by other components, are outputs of the pipeline as
a whole, or are discarded.

There are two kinds of components: steps and (language)
constructs. Steps carry out single operations and have no
substructure as far as the pipeline is concerned, whereas constructs
can include components within themselves.

This is a pipeline that consists of two steps, XInclude and
Validate. The pipeline itself has two inputs, “Document” and “Schema”.
How these inputs are connected to XML documents outside the pipeline
is implementation-defined.
The XInclude
step reads the pipeline input “Document” and produces a result
document. The Validate step reads the pipeline input “Schema” and
the output from the XInclude step and produces a
result document. The result of the validation, “Resultvalidation Document”,
is the result of the
pipeline, “Result pipeline.
How pipeline outputs are connected to XML documents outside the pipeline is
implementation-defined.

The heart of this example is the conditional.conditional construct. The “choose”
construct evaluates an XPath expression over a test document. Based on the
result of that expression, one or another branch is evaluated. In this example,
each branch consists of a single validate component.

2 Pipeline Concepts

[Definition: A pipeline
is an acyclic, directed graph of components connected together by
inputs and outputs. In other words, it's a set of components connected
together, with outputs flowing into inputs, without any loops (no component can
read its own output, directly or indirectly).]
A pipeline is itself a
construct and must satisfy the constraints on
constructs.

The result of evaluating a pipeline is the result of evaluating
the components that it contains, in the order determined by the
flow graph connections
between them. A pipeline must behave as if it
evaluated each component each time it occurs.occurs in the flow graph. Unless otherwise
indicated, implementations must not assume that
components are functional (that is, that their outputs depend only on
their explicit inputs and parameters) or side-effect free.

2.1 Steps, Constructs, and Subpipelines

Steps are the basic computational units of a pipeline.
[Definition: A step is an atomic
component that performs a
unit of XML processing, such as XInclude or transformation.] Steps can
perform arbitrary amounts of computation but they are indivisible from
the point of view of the construct that contains them.
Steps carry out fundamental XML operations. An XSLT step,
for example, performs XSLT processing; a validation step validates
one input with respect to some schema, etc.

Language constructs, on the other hand, control and organize the flow of
documents through a pipeline, reconstructing familiar programming
language functionality such as conditionals, iterators and exception
handling. As such, they typically contain components, whose evaluation
they control.

[Definition: A
construct is a componentunit of XML processing that
contains additional components. That is, a construct differs from a
step in that its semantics are at least partially determined by the
components that it contains.]

Every construct contains zero or more components.
[Definition: The components that occur
directly inside a construct
are called contained components.][Definition: A construct which immediately
contains a component is called its container.]

[Definition: The components (and the connectionscomponents
immediately between them)
within a container form
a subpipeline.]Each construct determines how
and which, if any, of its subpiplines is evaluated.

Steps and constructs have “ports” into which inputs and outputs are
connected. Each component has a number of input ports and a number of
output ports, all with unique names. A component can have zero input
ports and/or zero output ports.
(All components have an implicit standard output port for
reporting errors that must not be declared.)

Components have any number of parameters, all with unique names.
A component can have zero parameters.

2.2 Inputs and Outputs

Although some kinds of components can read and write non-XML resources,
what flows between components as
inputs and outputs are exclusively XML documents or sequences of XML
documents. Each XML document (or document in a sequence) must be a
an [Infoset]formed with aor
Documentdocument.
InformationIs support Itemfor at its root.optional?
The inputs and outputs can be
implemented as sequences of characters, events, or object models, or any other
representation the implementation chooses.

It's inconsistent to point to the XML Recommendation on the one
hand, and on the other hand to specify that the representation doesn't
have to be a sequence of characters. Fix this, probably by referring to
the Infoset and describing XML 1.0/1.1 in some other way.

It is a dynamic error if a non-XML
resource is produced on a component output or arrives on a component
input.

Editorial Note

What about the cases where it's impractical to test for this error?

An implementation may make it possible for a
component to produce non-XML output—for example, writing a PDF
document—but that output cannot flow through the pipeline. Similarly,
one can imagine a component that takes no pipeline inputs, reads a non-XML file
from a URI, and produces an XML output. But the non-XML file cannot be
an input to a component or pipeline.

Each component declares its input and output ports.
[Definition: Thethe input ports declared on a
component are its declared inputs.][Definition: The output ports declared on a
component are its declared outputs.]

All of the declared inputs
of a component must be connected to
inputs. outputs in the pipeline.
It is a static error if a component has
an input port which is not connected. Unconnected output ports are allowed;
any documents produced on those ports are simply discarded.

[Definition: The
signature of a component is the set of inputs,
outputs, and parameters that it is declared to accept.] Each
type of step (e.g. XSLT orXSLT, XInclude) has a fixed signature, declared
globally or built-in, which all its instances share, whereas each
instance of a construct has its own signature declared
locally.

[Definition: A component
matches its signature if and only if it specifies
an input for each declared input and it specifies no inputs that are not
declared; it specifies no outputs that are not declared; it specifies
a parameter for each parameter that is declared to be required; and it
specifies no parameters that are not declared.]
In other words, every input and required parameter must be specified
and only inputs, outputs, and parameters that are declared may be
specified. Outputs and optional parameters do not have to be
specified.

Each input and output is declared to accept or produce either a single
document or a sequence of documents. It is not a static error to connect a port
thatof a component which is declared to produce a sequence of documents to a port that is declared to
accept only a single
document. It is, however, a dynamic error if the former component actually
produces more than a single document at run time.

Steps may also produce error, warning, and informative
messages. These messages appear on a special “error output” that is available
in the catch clause of a try/catch.

2.3 Parameters

[Definition: A parameter is
a QName/value pair.] The value of a parameter must be a string.
If a document, node, or other value is given, its (XPath 1.0) string value
is computed and that string is used.

2.4 ConnectionsFlow Graph

ComponentsThe components contained in a
pipeline or other container are
the nodes of a flow graph. The input and
output ports of the components are connected togetherby arcs in that
graph.
Consider two components in such a by their inputs and
“component outputs.
B”.
[Definition: Components A and B are
connected if they are either
directly or indirectly connected. Component A is directly
connected to B if an output of A is associated with an input
port of B. Component A is indirectly connected to B if there
is a chain of directly connected components that allows
traversal from A to B.]

With respect to connected components, we
can speak of one component being either before or after another.
[Definition: Component A is before
component B if component B is a contained component of component A, either
directly or indirectly, or if any output from component A is connected
to any input of component B, either directly or indirectly.][Definition: Component A is after
component B if component B is the container
for component A (or an ancestor of such a container) or if any output from component B
is connected to any input of component A, either directly or indirectly.]

It is a static error if a component is either before
or after itself. In other words, every flow graph must
be acyclic.

3 Language Constructs

This section describes the core language constructs of XProc.

Every component in a pipeline has five parts: a set of inputs, a
set of outputs, a set of parameters, a set of contained
components, and a context inherited from its container.
[Definition: The contextis the
the set of input ports, output ports, and parameter
names that are visible.]

3.1 Pipeline

The context ofhas a pipeline is its inherited context modified
as follows:

Allnumber of the declared inputsof the pipeline are added to theinput
outputsand in the context.

The union of all the declared outputs of the
contained componentsare added to the
outputsin the context.

All of the declared parameters of the pipeline are added to the
parametersin the context.

This is the context used by the pipelineports and inherted by its
containedparameters. components.

Viewed from the outside, a pipelineit is a
black box which performs some calculation on its inputs and produces
its outputs. From the pipeline author's perspective, the computation
performed by the pipeline is described in terms of contained
components which read the pipeline's inputs and produce
the pipeline's outputs.

For example, a pipeline might accept a document and a stylesheet
as input; perform XInclude, validation, and transformation;transformation over its
inputs; and produce a sequence of formatted documents as its
output.

There is one additional constraint imposed on pipelines:
a pipeline must not itself be
a contained component.

3.2 For-Each

A for-each construct processes a sequence of documents,
applying its subpipeline to each document in
turn.

Inputs

Exactly one, as declared.

Outputs

As declared.

Parameters

As declared.

Contained components

As declared.

The context of a for-each is its inherited context modified
as follows:

All of the declared inputsof the for-each are added to the
outputsin the context.

The union of all the declared outputs of the
contained componentsare added to the
outputsin the context.

All of the declared parameters of the for-each are added to the
parametersin the context.

The for-each construct can be used in cases where a component
requires a single document input but a pipeline needs to process a
sequence of documents with that component.

The result of the for-each is a sequence of documents
produced by processing each individual document in the input sequence.
If the subpipeline is connected to one or more output ports on
the for-each, what appears on each of those ports is the sequence
of documents produced by each iteration of the loop.

For example, a for-each might accept a sequence of DocBook
chapters as its input, process each chapter in turn with XSLT, and
produce a sequence of formatted chapters as its output.

3.3 Viewport

A viewport construct processes a single document, applying
its subpipeline to one or more subsections of the document.

Inputs

Exactly one, as declared.

Outputs

Exactly one, as declared.

Parameters

As declared.

Contained components

As declared.

The context of a viewport is its inherited context modified
as follows:

All of the declared inputsof the viewport are added to the
outputsin the context.

The union of all the declared outputs of the
contained componentsare added to the
outputsin the context.

All of the declared parameters of the viewport are added to the
parametersin the context.

The
result of the viewport is a copy of the original document with the
selected subsections replaced by the results of applying the
subpipeline
to them.

For example, a viewport might accept an XHTML document as its
input, apply encryption to selected div elements within
that document, and return an XHTML document that is the same as the
original except that each selected div has been replaced by
its encrypted result.

3.4 Choose

A choose construct selects exactly one of a list of
alternative subpipelines
based on the evaluation of XPath expressions.

Inputs

As declared.

Outputs

As declared.

Parameters

As declared.

Contained components

As declared.A choose
construct contains several alternate subpipelines, exactly one
of which will be evaluated.

The context of a choose is its inherited context modified as
follows:

All of the declared inputsof the choose are added to the
outputsin the context.

The declared outputs of (any one of) the subpipelines
are added to the
outputsin the context.

All of the declared parameters of the choose are added to the
parametersin the context.

This is the context used by the choose and inherted by its
subpipelines.

The list of alternative subpipelines consists of zero or more
subpipelines, each guarded by an XPath expression (with an associated context
document), followed optionally by a
single default subpipeline.

The choose considers each subpipeline in turn and selects
the first (and only the first) subpipeline for which the guard
expression evaluates to true in the context of its context document.
If there are no subpipelines for which the expression evaluates to
true, the default subpipeline, if it was specified, is
selected.

After a subpipeline is selected, it is
evaluated as if only it had been present.

The context of the contained componentsin
the selected subpipeline is the context of the choose with the union of
all the declared outputs of the contained componentsof the selected subpipeline added to the outputsin the context.

For example, a choose might test a schema and apply XML
Schema validation to an input document if the schema is an XML Schema
document, apply RELAX NG validation if the schema is a RELAX NG
grammar, or perform no validation otherwise.

In order to ensure that the result of the choose is
consistent irrespective of the subpipeline chosen, each subpipeline must
declare the same number of outputs with the same names. It is a
static error if two subpipeline in a choose
declare different outputs.

A groupis a convenience wrapper for a collection of components
and can be used to perform parameter renaming to aid in reuse of sets
of components.

3.6 Try/Catch

A try construct isolates a
subpipeline, preventing any errors that arise
within it from being exposed to the rest of the pipeline.

Inputs

As declared.

Outputs

As declared.

Parameters

As declared.

Contained components

As declared. A trycontains
two subpipelines,construct one or both of which will be evaulated.

A try construct containswith two subpipelines: an initial
subpipeline and a recovery (or “catch”) subpipeline.

The initial context of a try is its inherited context modified
as follows:

All of the declared inputsof the try are added to the
outputsin the context.

The union of all the declared outputs of the
contained componentsin the initial pipeline are added to the
outputsin the context.

All of the declared parameters of the try are added to the
parametersin the context.

This is the context used by the try and inherted by the
contained componentsin its initial pipeline.

Editorial Note

In the context of try/catch, “errors” refers to component failure which
is not the same as a static or dynamic error in the pipeline itself. (Though perhaps
it will be possible to recover from some dynamic errors.) The notion of component
failure as a distinct class of error needs to be described.

The try constructIt
evaluates the initial subpipeline and, if no errors occur, the results of that
pipeline are the results of the construct. However, if any errors
occur, it abandons the first subpipeline, discarding any output that
it might have generated, and evaluates the recovery subpipeline.

In this case, it must recompute the context inherited by the
contained componentsin the recovery pipeline.
The context of the recovery subpipeline is the inherited context of the try
modified as follows:

All of the declared inputsof the try are added to the
outputsin the context.

The union of all the declared outputs of the
contained componentsin the recovery pipeline are added to the
outputsin the context.

All of the declared parameters of the try are added to the
parametersin the context.

The results of the recovery subpipeline are the results of the
try construct. If the recovery subpipeline is evaluated and a component within
that subpipeline fails, the try fails.

For example, a pipeline might attempt to process a document by
dispatching it to some web service. If the web service succeeds, then
those results are passed to the rest of the pipeline. However, if the
web service cannot be contacted or reports an error, the catch
construct can provide some sort of default for the rest of the
pipeline.

In order to ensure that the result of the try is consistent
irrespective of whether the initial subpipeline provides its output or
the recovery subpipeline does, both subpipelines must declare the same
number of outputs with the same names. It is a static
error if the two subpipelines declare different
outputs.

In order to support corrective action in the recovery
subpipeline, components inside it have access to all the
error output of the components that were in the initial
subpipeline on a special port named “#error”.

Note

In evaluating the initial subpipeline, failure of one component
can cause other components to fail. In addition, some components that
fail might not produce output on their error ports and some components
that succeeded might produce such output. This pipeline language places
no constraints on the order of error messages provided to the
recovery subpipeline, nor does it attempt to guarantee that such
output will be available in all cases.

3.7 Other Steps

A pipeline document may declare additional steps. These can
be implementation-defined steps or can be defined through some
implementation-dependent extension mechanism. Each declared step
must have a name and a signature. It is a static
error if a pipeline contains an step that is not
recognized by the processor.

4 Syntax

This section describes a set of XML syntactic elements sufficient to
represent all the aspects of a pipeline, as set out in the preceding sections.

4.1 Overview

ElementsA pipeline in a pipelineflow documentgraph represent the pipeline,connected
the components it contains, the connections betweencontain
flow thosegraphs. components, the components
and connections contained withinthe
flow them,graphs,
using and so on. Each component is represented
by an element;and a combination of elements and attributes to specify
how the inputs and outputs of each component are connected and how parametersto
are passed.

Conceptually, we can speak of components as objects that have
inputs and outputs that are connected together and which may have
contain additional components. Syntactically, we needparameter a mechanism
for specifying these relationships.bindings.

Containment is
represented naturally using nesting of XML elements. The connections
between components are expressed using names and references to those
names.

Five kinds of things are named in XProc:

The types of components,

Therepresent components themselves,

Inputall ports,

Outputhave ports, and

Parameters

4.1.1 Scoping of Names

The scope of thenames, names of component types is the pipeline. Each pipeline
processor has some number of built in component type and each pipelines may declare
(directly, or by reference to an external library) additional component types.

Thevalue scope of the names of the components themselves is determined
by the contextof each component. In general,
the name of a component, the names of its sibling components, theThese
names of any components that itestablish contains directly, the names of its
ancestors;inputs and the names of its ancestor's siblings are all in the same scope.outputs
It is a static errorif two components with the same
name appear in the same scope. All in-scope components musthave unique names.

The scopespeak of an input“the or output port name is the component on
which it is defined. The names of all the ports on any component
mustbe unique.

Takennamed together, these uniqueness constraints guaranteeto that the
combination of ainput component name and a port name uniquely identifies
exactly one in-scope port.

The scope of parameter names is essentially the same as the
scope of component names, with the following caveat. Whereas component
names must be unique, parameter names may be repeated. The declaration
of a parameter on a component shadows any declarationnamed that may already
be in-scope.‘bar’.”).

4.1.2 Associating Documents with Ports

A document or a sequence of documents can be bound to a port
in three ways: by source, by URI, or by providing it “here”, as the content of
the element establishing the binding. A document must be specified in
exactly one of these ways, otherwise a static error is
raised.

Specified by URI

[Definition: A document is specified
by URI
if it refers to it with a URI.] The
href attribute is used for this purpose.

In this example, the input to the Identity step named
“otherstep” comes from “http://example.com/input.xml”.

It is a dynamic error if the processor
attempts to retrieve the specified URI and fails. (For example, if the
resource does not exist or is not accessible with the user's
authentication credentials.)

Specified by source

[Definition: A document is specified
by source
if it refers to a specific port on another component.] The
step and
source attributes are used for this purpose.
(The step attribute may refer
to any kind of component, either step or
construct, its name notwithstanding.)

There are constraints on what ports can be specified:specified, dependent
on context.
When the specification is for an input to a component, the
specified port must be
a declared ininput on the component's container
(e.g., a choose or for-each) or
an output ports of some other component within the component's
container or
an allowed input, per these three alternatives, for the
component's container.
If the specification is for the output of a construct, the
specified port must be an output port of some component within that
construct.
Alternative 3 above
corresponds context.to a connection between components in two separate
subpipelines, for example from a component in the
subpipeline of a pipeline itself to a component
in the subpipeline of a for-each. This
contradicts the definition of subpipeline as a
self-contained flow graph in
.
It is the WG's intention to provide for the functionality
delivered by such connections, but how this will be accomplished has
not yet been decided.

In this example, the “document” input to the XInclude step named
“expand” comes from the “result”
port of the step named “otherstep”.

Here documents are considered “quoted”, they are not
interpolated or available to the pipeline processor in any way except
as documents flowing through the pipeline.

4.1.3 Extension attributes

[Definition: An element from the XProc
XProc namespace may have any attribute not from the
XProc namespace, provided that the expanded-QName of the attribute has
a non-null namespace URI. These attributes are
called called
extension attributes.] The presence of
an extension attribute must not cause the connectionsflow betweengraph components
to differ from the connectionsflow graph that any other
conformant XProc processor would produce. They must not cause the
processor to fail to signal an error that a conformant processor is
required to signal. This means that an extension attribute must not
change the effect of any XProc element except to the extent that the
effect is implementation-defined or implementation-dependent.

A processor which encounters an extension attribute that it does not
recognize must behave as if the attribute was not present.

4.1.4 Extension elements

[Definition: Outside the context of a
“here document”, any element not in the XProc namespace
is an extension element.]
The presence of an extension element must not cause the connections
betweenflow componentsgraph to differ from the connectionsflow graph that any other
conformant XProc processor
might would produce.
They must not cause the processor to fail to signal an error that a
conformant processor is required to signal. This means that an
extension element must not change the effect of any XProc element
except to the extent that the effect is implementation-defined or
implementation-dependent.

Inside the context of a
here document, all content is
considered quoted so neither extension elements nor XProc elements are said to
occur.

A processor which encounters an extension element that it does not
recognize must behave as if neither the element nor its
attributes, nor any of its content was present.

4.2 Detailed Vocabulary

This section describes in detail the XML vocabulary that represents a pipeline.

4.2.1 p:pipeline Element

A p:pipeline represents a pipeline. Its children declare
the inputs, outputs, and parameters that the pipeline exposes and
represent its subpipeline.

4.2.2 p:input Element

A p:inputidentifiesdeclares an input for a component, optionally
declaring it, if necessary.port.

<p:inputport = QName sequence? = yes|no />

The port attribute defines the name
of the port. It is a static error to identify
two ports with the same name on the same component.
On atomic components, it is a static errorif the portgiven does not match the name
of an input port specified in the component's declaration.name.

On language constructs and declare-step-type, aninput input
declaration can indicate if a sequence of documents is allowed to
appear on the declared port. If sequence
is specified with the value “yes”, then a sequence is allowed. If the
sequence is not specified, or has the value
value “no”, then it is a dynamic error for a sequence
sequence of more than one document to appear on the declared
port.

The declaration may be accompanied by a
binding (or default binding) for the input. This binding can be
accomplished
by source:

If a binding is provided, a select expression
may also be provided.
If provided, the
specified XPath select expression is used to filter the document(s) that are read.
Each matching node or set of nodes is wrapped in a document and
provided to the input port.

It is a static errorif more than one of
source,
urior here document is
specified.

The selectexpression, if specified,
applies the specified XPath select expression to the document(s) that are
read. Each matching node or set of nodes is wrapped in a document and provided
to the input port. In other words,

4.2.3 p:output Element

The port attribute defines the name
of the port. It is a static error to identifydeclare two
two ports with the same name on the same component.name.

An output declaration can indicate if a sequence of documents is
allowed to appear on the declared port. If
sequence is specified with the value “yes”,
then a sequence is allowed. If the
sequence is not specified, or has the value
“no”, then it is a dynamic error if the component
produces a sequence of more than one document on the declared port.

On a construct, the declaration must be accompanied by a
binding for the output. This binding can be accomplished
by source:

4.2.4 p:parameter Element

The p:parameterelement is used both to declare
parameters and to establish values for them. WhenA used on a
p:declare-step-typeor language construct,
p:parameter declares the parameter and may associate a
default value with it. Used elsewhere, p:parameterassociates a value with the parameter.

4.2.4.1 Declaring Parameters

Parameters are declared on p:declare-step-typeand
language constructs with p:parameter:

<p:parametername = token required? = yes | no />

Components must declare the parameters that they accept.

The name attribute must be a
QName, a single asterisk (*), a QName, or a string of the form
*:NCName or
NCName:*.

If the“*” nameisin a QName, the parameter
mayname be declaredfunctions as requiredora it may
be
givenallowing a default value.
It iswith a staticmatching errorname to specify
thatbe the parameter is requiredor that it has
a default value if thespecified.
If namegiven is not
a QName. It is also a staticthen errorto specify
that the parameter is both requireddeclaration
andmay has a default value.

Ifvalue a parameter is required, it is a static
errorselect to invoke the component without specifyingas a value forper
that parameter.p:param.

4.2.4.2 Using Parameters

ParametersA p:step are used on p:stepin elements with
p:parameter:pipeline.

<p:parametername = QName />

The parametertype attribute identifies the
component mustto be
given instantiated. It is a valuestatic error whenif
the itname is used.not unique in the current scope, if the

specified type

4.2.4.3 Assigningis not Values to Parameters

Whenthe a parameter is declared, itspecified
inputs, can be given a defaultnot
match value.
Whensignature for steps of it istype.
p.input used,Element
A itp:input mustidentifies be given a value.component.
Inputs identify their That
valuesource can be specified in several ways,
by source:

Is there really anystatic valueerror in allowing here documents on parameters?of
source,
Doinguri so means that andocument empty parameter is
specified.
The alwaysport associated with an emptythe
here document, which in turn makes it impossible to detect some kinds ofsource.
errors statically. Mighta
static iterror
if not be better to disallow here documents and add
aof rule that any reference tofor the
component context isif more an error if a documentspecified
for is not
explicitlygiven provided?

If aThe selectexpressionexpression, if specified,
applies the is given, it is evaluated
againstto the documentdocument(s) that specifiedare
read. and the (XPath 1.0) stringset value
of thenodes is expression becomes the default value ofprovided
to the parameter. If no
selectIn expression iswords,
<p:input given, the
(XPathhref="http://example.org/input.html"/>
provides 1.0) stringsingle valuedocument, ofbut
<p:input the document becomesselect="//html:div"/>
provides the
defaulta value of the parameter. It is a dynamicfor erroreach if
ahtml:div in http://example.org/input.html.
<p:input document sequence is specified.

select="//html:div"/>

The select expression may refer to the values of other in-scope parameters
byhtml:div variable reference. It is a static errorifof the variable
reference uses a QName that is notread from the nameportname
port of anthe component in-scope parameterorigin.
p:param orElement
A ifp:param the
referenceassociates is circular, either directly or indirectly.

The defaultvalue of value may also be specified within several ways; as the
valuecontent of
the p:param attribute:

<p:parametername = QNamevalue = string />

In this case, the default(XPath 1.0) string value of the
content parameter isbecomes the string
specifiedvalue in the valueattribute.parameter.

4.2.5 p:step Element

The typeas attribute identifies the
componentexpression. to be instantiated. It is a static errorifwhich
the nameexpression is notevaluated must be unique in theexactly one current scope,three
ways, ifby thesource:
specified
by typeURI:
is
or notby
here knowndocument:
The to the processor,string orvalue if the specified
inputs,result outputs, andevaluating
the parameters do not
matchbecomes the
signaturefor steps of that type.

4.2.6 p:import-parameter Element

An p:import-parameter provides a set of in-scope parameters
to a component.

<p:import-parametername = token />

All in-scope parameters which match the name
are made available to the component as if they had been specified with
individual p:parameter elements.

The name attribute must be a single
asterisk (*), a QName, or a string of the form
*:NCName or
NCName:*.

4.2.7 p:for-each Element

Exactly one input must be declared and it must include a binding
for the port it declares. If outputs are declared, they must also
include a binding.

The processor will provide each document read
through the input binding to the subpipeline
represented by the children of the p:for-each, one at a
time. For each declared output, the processor will collect all the
documents that are produced for that output from all the iterations,
in order, into a sequence. The result of the p:for-each is
that set of document sequences.

The //chapters of the DocBook document are
selected. Each chapter is transformed into HTML and XSL FO using an
XSLT step. The resulting HTML and FO documents are aggregated together
and appear on the html and fo
ports, respectively, of the chapters construct itself.

It is a static error if there is not exactly one
p:input child of p:for-each or if the declared
input does not specify a binding.

It is a static error if any declared output
does not specify a binding.

4.2.8 p:viewport Element

Exactly one input must be declared and it must include both a
binding and a select expression. Exactly
one output must be declared and it must include a binding. The
processor will provide a document that contains each set of nodes that
matches the specified select expression through the input binding to
the subpipeline represented by the children of the p:viewport, one at a
time. What appears on the output from the p:viewport will
be a copy of the input document except that where each matching node
or set of nodes appears, the result of applying the subpipeline to
those nodes will be output.

It is a dynamic error if the input source is
a sequence of more than one document or if the output from any iteration
is a sequence of more than one document.

The //h:div[@class='enc']s of the document are
selected. Each selected div is encrypted and the resulting
encrypted version replaces the original div. The result of the whole construct is
a copy of the input document with each selected div encrypted.

Editorial Note

The WG has decided that the semantics of the XPath expression on viewport
should be “match” semantics (a la XSLT), not select semantics. That decision
is not been reflected in this draft.

It is a static error if there is not exactly one
p:input child and exactly one
p:output child of p:viewport or if the declared
ports do not specify a binding.

If no context is specified on the p:when, the context specified
on the p:choose is used.
It is a static error if no context is specified in either
place.

Editorial Note

The p:choose/p:when elements are
inconsistent with p:for-each and p:viewport. It
would be consistent to allow exactly one p:input
to specify the context, instead of specifying the
context directly on the p:choose and p:when
elements. (Or conversely, to remove the p:input
elements from p:for-each and p:viewport.)

Note, in particular, that without the p:input,
there's no direct way to specify a here document for the context.
The WG has not yet decided how to resolve this issue.

Within the p:catch block, the special input port
#error is defined. The document(s) on that port constitute
the error messages received from the component which failed. Note that the
order of the messages on that port is undefined. Note also that the failure
of one component can cause others to fail and the component which signaled
the error might not be the only or even the first component that failed.

Both the p:group and the p:catch
must declare the same number of output ports with the same names. It is a
static error if they do not.

4.2.12 p:declare-step-type Element

A p:declare-step-type provides the type and signature
of an implementation-dependent type of step. It declares the inputs, outputs,
and parameters for all steps of that type.

We need to make some provision for identifying the
implementation of a declared step, even if it's no more than
implementation-defined extension attributes. We'll need some sort of
mechanism for declaring multiple implementations too.

It is a static error if the type attribute of a p:step
is not recognized by the processor. It is not
an error to declare such an step, only to use it.

Exactly one input declaration of a
p:declare-step-typemay use the
name “*” to indicate
that the step accepts an arbitrary number of inputs.

4.2.13 p:pipeline-library Element

A p:pipeline-library contains one or more step
declarations and/or pipelines. It declares steps that pipelines
can import.

4.2.14 p:import Element

An p:import loads a pipeline or pipeline library, making
it available in the current context.

<p:importhref = URI />

An import statement loads the specified URI and makes any pipelines
declared within it available to the current pipeline.

It is a dynamic error if the URI cannot
be retrieved or if, once retrieved, it does not point to a
p:pipeline-library or p:pipeline. If it points to
a p:pipeline, it is a dynamic error if the
pipeline does not have a name.

5 Errors

5.1 Static Errors

[Definition: A static
error is one which can be detected before pipeline evaluation
is even attempted.] Examples of static errors include cycles,
incorrect specification of inputs and outputs, and reference to unknown
components.

Static errors are fatal and must be detected before any components
are evaluated.

5.2 Dynamic Errors

A [Definition: A dynamic
error is one which occurs while a pipeline is being
evaluated.] Examples of dynamic errors include references
to URIs that cannot be resolved, components which fail, and pipelines
that exhaust the capacity of an implementation (such as memory or
disk space).

If a component fails due to a dynamic error, failure propagates
upwards until either a try is encountered or the entire pipeline
fails. In other words, outside of a try, component failure causes
the entire pipeline to fail.

B Glossary

Component A is after
component B if component B is the container
for component A (or an ancestor of such a container) or if any output from component B
is connected to any input of component A, either directly or indirectly.

Component A is before
component B if component B is a contained component of component A, either
directly or indirectly, or if any output from component A is connected
to any input of component B, either directly or indirectly.

Components A and B are
connected if they are either
directly or indirectly connected. Component A is directly
connected to B if an output of A is associated with an input
port of B. Component A is indirectly connected to B if there
is a chain of directly connected components that allows
traversal from A to B.

A
construct is a componentunit of XML processing that
contains additional components. That is, a construct differs from a
step in that its semantics are at least partially determined by the
components that it contains.

An element from the XProc
XProc namespace may have any attribute not from the
XProc namespace, provided that the expanded-QName of the attribute has
a non-null namespace URI. These attributes are
called called
extension attributes.

A component
matches its signature if and only if it specifies
an input for each declared input and it specifies no inputs that are not
declared; it specifies no outputs that are not declared; it specifies
a parameter for each parameter that is declared to be required; and it
specifies no parameters that are not declared.

The components (and the connectionscomponents
immediately between them)
within a container form
a subpipeline.

C The Error Vocabulary

This appendix describes the XML vocabulary that components are
expected to use to identify messages on their error ports.

To be described…

D Standard Component Library

This appendix describes the standard XProc components.

Note

The components described in this draft are intended mainly as a
starting point for discussion and to present a flavor for the sorts of
components envisioned. The WG has not yet discussed them in detail.

1 Required Components

This section describes standard components that must be supported
by any conforming processor.

To be described…

1.1 Identity

The identity construct makes a verbatim copy of its input
available on its output.

Inputs

input, a sequence of documents.

Outputs

result, a sequence of documents.

Parameters

None.

Note

The use of 'input' as port name here aren't probably the best
choice. Since the identity component must be able to take a sequence
of documents, the port name 'document' isn't appropriate either.

1.2 XSLT

The xslt component applies an XSLT 1.0 transformation supplied by the
input to the 'transform' port to the document provided on the 'document'
port. It produces a sequence of documents on its 'result'
port.

Inputs

document, a single document.

transform, a single document.

Outputs

result, a sequence of documents.

Parameters

Any

All of the specified parameters are made available to the
XSLT processor. If the XSLT processor signals a fatal error, the component fails,
otherwise the result of the transformation is produced on the
resultport.

Note, an XSLT 1.0 processor without any extensions can only produce a single
XML document as its result. However, many XSLT 1.0 processors provide extensions
which allow the processor to produce more than one result. In such cases, more than
one document may appear in the resultport. The principle result
document will always appear last.

1.3 XInclude

The XIinclude component applies xinclude processing semantics
to the document. The referenced documents are calculated against the
base URI and are not provided as input to the component.

Inputs

document, a single document.

Outputs

result, a single document.

Parameters

None.

1.4 Serialize

The serialize component applies XML serialization to the
children of the document element and replaces those children with their
serialization. The outcome is a single element with text content that
represents the "escaped" syntax of the children if they were
serialized.

Inputs

document, a single document.

Outputs

result, a single document.

Parameters

None.

1.5 Parse

The parse component takes the text value of the document
element and parses the content as if it was and unicode character stream
containing XML. The outcome is a single element with children from the
parsing of the XML content. This is the reverse of the serialize
component.

When the text value is parsed, a document element wrapper should
be assumed so that element siblings can be parsed back into XML.
Further, if the 'namespace' parameter is specified, the default
namespace is declared on that wrapper element. If a wrapper element name
is specified, it is not returned in the result.

If the 'content-type' parameter is specified, an implementation
can use a different parser to produce XML content. Such a behavior is
implementation defined. For example, for the mime type 'text/html', an
implementation might provide an HTML to XHTML parser (e.g. Tidy).

Inputs

document, a single document.

Outputs

result, a single document.

Parameters

namespace

content-type

1.6 Load

The load component has no inputs but takes a parameter that
specifies a URI of an XML resource that should be loaded and provided as
the result.

Inputs

None.

Outputs

result, a single document.

Parameters

href, required.

Load attempts to read an XML document from the specified URI. If the
document does not exist, or is not well-formed, the component fails. Otherwise,
the document read is produced on the resultport.

Note

Should this component allow href to be a list of URIs and return a
sequence of documents?

1.7 Store

The store component stores a serialized version of its input
to a URI. The URI is either specified explicitly by the 'href' parameter
or implicitly by the base URI of the document. This component has no
output.

Note

Should this component allow sequences on its input?

Inputs

document, a single document.

Outputs

None.

Parameters

href

Note

A more direct “serialize-to-octet-stream” component may also be required.
One, for example, that supports the XSLT 2.0/XQuery 1.0 Serialization
specification.

2 Optional Components

T.B.D.

3 Micro-Operations Components

Note

No decisions have been made about whether these components will be
optional or required.

3.1 Rename

The rename component renames elements or attributes in a
document based on parameter values.

Inputs

document, a single document.

Outputs

result, a single document.

Parameters

select

name, required.

Each element, attribute, or processing-instruction identified by the
XPath 1.0 expression specified in the 'select' parameter is renamed. The name
of elements and attributes, and the target of processing-instructions, are
changed to the value of the 'name' parameter.

The component fails if the specified name is not a valid name or if the
renaming would introduce a syntactic error into the document (i.e., if it would
create two attributes with the same name on the same element).

3.2 Wrap

The wrap component wraps the document element with a new
document element.

Inputs

document, a single document.

Outputs

result, a single document.

Parameters

name, required.

3.3 Insert

The insert component inserts a document specified on the
'insertion' port as a child of the document element provided on the
'document' port. The position of this insert is governed by the
parameters.

Inputs

document, a single document.

insertion, a single document.

Outputs

result, a single document.

Parameters

at-start, required.

If the at-startparameter is true, the
insertion document will be inserted as the first child(ren) of the document,
otherwise it will be inserted as the last child(ren). If the parameter
is not specified, a value of true is assumed.

3.4 Set-attributes

The set-attributes component sets attribute values on the
document element using the attribute values provided on the document
element of the 'attribute' port's document.