Abstract

This specification describes the syntax and semantics of
XProc: An XML Pipeline Language, a
language for describing operations to be performed on XML
documents.

An XML Pipeline specifies a sequence of operations to be
performed on zero or more XML documents. Pipelines generally accept
zero or more XML documents as input and produce zero or more XML
documents as output. Pipelines are made up of simple steps which
perform atomic operations on XML documents and constructs similar
to conditionals, iteration, and exception handlers which control
which steps are executed.

Status of this Document

This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest
revision of this technical report can be found in the W3C technical reports index at
http://www.w3.org/TR/.

This document was produced by the XML Processing Model Working
Group which is part of the XML Activity. Publication as a
Working Draft does not imply endorsement by the W3C Membership.
This is a draft document and may be updated, replaced or obsoleted
by other documents at any time. It is inappropriate to cite this
document as other than work in progress.

Since the last public working draft, the Working Group has
considered several hundred
comments in nearly 150 threads. We've responded to many of
these by changing the specification. Some of the significant
changes in this draft are:

Significantly reworked the syntax and semantics of variables,
options, and parameters. Added p:variable. Imposed a syntactic distinction
between declaration (p:option) and use (p:with-option/p:with-param) of
options and parameters.

1 Introduction

An XML Pipeline specifies a sequence of operations to be
performed on a collection of XML input documents. Pipelines take
zero or more XML documents as their input and produce zero or more
XML documents as their output.

A pipeline
consists of steps. Like pipelines, steps take zero or more XML
documents as their inputs and produce zero or more XML documents as
their outputs. The inputs of a step come from the web, from the
pipeline document, from the inputs to the pipeline itself, or from
the outputs of other steps in the pipeline. The outputs from a step
are consumed by other steps, are outputs of the pipeline as a
whole, or are discarded.

There are two kinds of steps: atomic steps and compound steps.
Atomic steps carry out single operations and have no substructure
as far as the pipeline is concerned, whereas compound steps control
the execution of other steps, which they include in the form of one
or more subpipelines.

This is a pipeline that consists of two atomic steps, XInclude
and Validate with XML Schema. The pipeline itself has two inputs,
“source” (a source document) and “schemas” (a sequence of W3C XML
Schemas). The XInclude step reads the pipeline input “source” and
produces a result document. The Validate with XML Schema step reads
the pipeline input “schemas” and the result of the XInclude step
and produces its own result document. The result of the validation,
“result”, is the result of the pipeline. (For consistency across
the step vocabulary, the standard input is usually named “source”
and and the standard output is usually named “result”.)

The pipeline document determines how the steps are connected
together inside the pipeline. How
inputs are connected to XML documents outside the pipeline is
implementation-defined.
How pipeline outputs are connected
to XML documents outside the pipeline is implementation-defined.

The heart of this example is the conditional. The “choose” step
evaluates an XPath expression over a test document. Based on the
result of that expression, one or another branch is run. In this
example, each branch consists of a single validate step.

This example, like the preceding, relies on XProc defaults for
simplicity. It is always valid to write the fully explicit form if
you prefer.

The media type for pipeline documents is application/xml. Often, pipeline documents are
identified by the extension .xpl.

2 Pipeline Concepts

[Definition: A pipeline is a set of connected steps, with outputs
of one step flowing into inputs of another.] A pipeline is
itself a step and
must satisfy the constraints on steps. Connections between steps
occur where the input of one step is bound to the output of
another.

The result of evaluating a pipeline (or subpipeline) is the
result of evaluating the steps that it contains, in an order
consistent with the connections between them. A pipeline must
behave as if it evaluated each step each time it occurs. Unless
otherwise indicated, implementations must
not assume that steps are functional (that is, that their
outputs depend only on their inputs,
options, and
parameters)
or side-effect free.

The pattern of connections between steps will not always
completely determine their order of evaluation. The evaluation order of steps not connected to one
another is implementation-dependent.

2.1 Steps

[Definition: A step is the basic computational unit of a
pipeline.] A typical step has zero or more inputs, from
which it receives XML documents to process, zero or more outputs,
to which it sends XML document results, and can have options and/or
parameters.

[Definition: An
atomic step is a step that performs a
unit of XML processing, such as XInclude or transformation, and has
no internal subpipeline. ] Atomic steps carry
out fundamental XML operations and can perform arbitrary amounts of
computation, but they are indivisible. An XSLT step, for example,
performs XSLT processing; a Validate with XML Schema step validates
one input with respect to some set of XML Schemas, etc.

There are many types of atomic steps. The standard
library of atomic steps is described in Section 7, “Standard Step
Library”, but implementations may
provide others as well. What
additional step types, if any, are provided is implementation-defined. Each
use, or instance, of an atomic step invokes the processing defined
by that type of step. A pipeline may contain instances of many
types of steps and many instances of the same type of step.

Compound steps, on the other hand, control and organize the flow
of documents through a pipeline, reconstructing familiar
programming language functionality such as conditionals, iterators
and exception handling. They contain other steps, whose evaluation
they control.

[Definition: A
compound step is a step that contains a
subpipeline.] That is, a compound
step differs from an atomic step in that its semantics are at least
partially determined by the steps that it contains.

The runtime semantics of a multi-container step are that it
behaves as if it evaluated exactly one of its subpiplines. In this
sense, they function like compound steps.

[Definition: A compound
step or multi-container step is a container for the steps directly within it or
within non-step wrappers directly within it.][Definition: The steps that occur
directly within, or within non-step wrappers directly within, a
step are called that step's contained
steps. In other words, “container” and “contained steps” are
inverse relationships.][Definition: The ancestors of a step are its container, the
container of its container, and all other containers above
it.]

[Definition: Sibling
steps (and the connections between them) form a subpipeline.][Definition: The last
step in a subpipeline is its last step in document
order.]

Note that user-defined pipelines, pfx:user-pipeline, are atomic;
although a pipeline declaration, contains a subpipeline, a
step which invokes a user-defined pipeline does not.

Steps have “ports” into which inputs and outputs are connected
or “bound”.
Each step has a number of input ports and a number of output ports;
a step can have zero input ports and/or zero output ports. (All
steps have an implicit output port for reporting errors that
must not be declared.) The names of
all ports on each step must be unique on that step (you can't have
two input ports named “source”, nor can you have an input port
named “schema” and an output port named “schema”).

Steps may have any number of options, all
with unique names. A step can have zero options.

Steps may have parameter input ports, on which parameters can be passed. The parameters passed
on a particular parameter input port must be
uniquely named. If multiple parameters with the same name are used,
only one of the values will actually be available to the step. A
step can have zero, one, or many parameter input ports, and each
parameter port can have zero or more parameters passed on it.

All of the different instances of steps (atomic or compound) in
a pipeline can be distinguished from one another by name. If the
pipeline author does not provide a name for a step, a default name
is manufactured automatically.

2.1.1 Step
names

If the pipeline author does not provide an explicit name, the
processor manufactures a default name. All default names are of the
form “!1.m.n…” where “m” is the position
of the step's highest ancestor within the pipeline document or
library which contains it, “n” is the position of the next-highest
ancestor, and so on, including both steps and non-step wrappers.
For example, consider the pipeline in Example 3, “A validate and
transform pipeline”. The p:pipeline step has no name, so it gets the
default name “!1”; the p:choose gets the name
“!1.2”; the first p:when gets the name
“!1.2.1”, etc. If the p:choose had had a
name, it would not have received a default name, but it would still
have been counted and its first p:when would still have been “!1.2.1”.

Providing every step in the pipeline with an interoperable name
has several benefits:

It allows implementors to refer to all steps in an interoperable
fashion, for example, in error messages.

Pragmatically, we say that readable ports are identified by a
step name/port name pair. By manufacturing names for otherwise
anonymous steps, we include implicit bindings without changing our
model.

In a valid pipeline that runs successfully to completion, the
manufactured names aren't visible (except perhaps in debugging or
logging output).

Note

The format for defaulted names does not conform to the
requirements of an NCName. This is an
explicit design decision; it prevents pipelines from using the
defaulted names on p:pipe elements. If an explicit connection
is required, the pipeline author must provide an explicit name for
the step.

2.2 Inputs
and Outputs

Although some steps can read and write non-XML resources, what
flows between steps through input ports and output ports
are exclusively XML documents or sequences of XML documents.

For the purposes of this specification, an XML document is an
[Infoset]. Implementations are free to transmit
infosets as sequences of characters, sequences of events, object
models, or any other representation that preserves the necessary
infoset properties (see Section A.3, “Infoset
Conformance”).

Most steps in this specification manipulate XML documents, or
portions of XML documents. In these cases, we speak of changing
elements, attributes, or nodes without prejudice to the actual
representation used by an implementation.

An implementation may make it
possible for a step to produce non-XML output (through channels
other than a named output port)—for example, writing a PDF document
to a URI—but that output cannot flow through the pipeline.
Similarly, one can imagine a step that takes no pipeline inputs,
reads a non-XML file from a URI, and produces an XML output. But
the non-XML data cannot arrive on an input port to a step.

All atomic steps are defined by a p:declare-step.
The declaration of an atomic step type defines the input ports,
output ports, and options of all steps of that type. For example,
every p:validate-with-xml-schema step has two
inputs, named “source” and “schema”, one output named “result”, and the same set of options.

Like atomic steps, top level, user-defined pipelines also have
declarations. The situation is slightly more complicated for the
other compound steps because they don't have separate declarations;
each instance of the compound step serves as its own declaration.
On these compound steps, the number and names of the outputs can be
different on each instance of the step.

Figure 4, “A compound step” illustrates
symbolically a compound step with one subpipeline and one output.
As you can see from the diagram, the output from the compound step
comes from one of the outputs of the subpipeline within the
step.

Figure 4. A compound step

[Definition: The
input ports declared on a step are its declared inputs.][Definition: The output ports declared on a
step are its declared outputs.]
When a step is used in a pipeline, it is connected to other steps
through its inputs and outputs.

When a step is used, all of the declared inputs of the step
must be connected. Each input can be
connected to:

The primary output port of a step
must be connected, but other outputs
can remain unconnected. Any documents produced on an unconnected
output port are discarded.

Output ports on compound steps have a dual nature: from the
perspective of the compound step's siblings, its outputs are just
ordinary outputs and must be connected as described above. From the
perspective of the subpipeline inside the compound step, they are
inputs into which something must be connected.

Each input and output is declared to accept or produce either a
single document or a sequence of documents. It is not an
error to connect a port that is declared to produce a sequence of
documents to a port that is declared to accept only a single
document. It is, however, an error if the former step actually
produces more than one document at run time.

It is also not an error to connect a port that is declared to
produce a single document to a port that is declared to accept a
sequence. A single document is the same as a sequence of one
document.

An output port may be connected to more than one input port. At
runtime this will result in distinct copies of the output.

[Definition: The
signature of a step is the set of
inputs, outputs, and options that it is declared to accept.]
The declaration for a step provides a fixed signature which all its
instances share.

[Definition: A step
matches its signature if and only if it
specifies an input for each declared input, it specifies no inputs
that are not declared, it specifies an option for each option that
is declared to be required, and it specifies no options that are
not declared.] In other words, every input and required
option must be specified and only
inputs and options that are declared may be specified. Options that aren't required do
not have to be specified.

Steps may also produce error,
warning, and informative messages. These messages are captured and
provided on the error port inside of a
p:catch.
Outside of a try/catch, the disposition of error messages is
implementation-dependent.

2.2.1 External Documents

It's common for some of the documents used in processing a
pipeline to be read from URIs. Sometimes this occurs directly, for
example with a p:document element. Sometimes it occurs
indirectly, for example if an implementation allows the URI of a
pipeline input to be specified on the command line or if an
p:xslt step
encounters an xsl:import in the
stylesheet that it is processing. It's also common for some of the
documents produced in processing a pipeline to be written to
locations which have, or at least could have, a URI.

The process of dereferencing a URI to retrieve a document is
often more interesting than it seems at first. On the web, it may
involve caches, proxies, and various forms of indirection. Resolving a URI locally may involve
resolvers of various sorts and possibly appeal to implementation-dependent
mechanisms such as catalog files.

In XProc, the situation is made even more interesting by the
fact that many intermediate results produced by steps in the
pipeline have base URIs. Whether
or not (and when and how) the intermediate results that pass
between steps are ever written to a filesystem is implementation-dependent.

In Version 1.0 of XProc, how
(or if) implementers provide local resolution mechanisms and how
(or if) they provide access to intermediate results by URI is
implementation-defined.

Version 1.0 of XProc does not require implementations to
guarantee that multiple attempts to dereference the same URI always
produce consistent results.

Note

On the one hand, this is a somewhat unsatisfying state of
affairs because it leaves room for interoperability problems. On
the other, it is not expected to cause such problems very often in
practice.

If these problems arise in practice, implementers are encouraged
to use the existing extension mechanisms to give users the control
needed to circumvent them. Should such mechanisms become
widespread, a standard mechanism could be added in some future
version of the language.

2.3 Primary Inputs and Outputs

As a convenience for pipeline authors, each step may have one
input port designated as the primary input port and one output port
designated as the primary output port.

[Definition: If
a step has a document input port which is explicitly marked
“primary='true'”, or if it has exactly one
document input port and that port is not explicitly marked
“primary='false'”, then that input port is
the primary input port of the
step.] If a step has a single input port and that port is
explicitly marked “primary='false'”, or if a
step has more than one input port and none is explicitly marked as
the primary, then the primary input port of that step is undefined.
A step can have at most one primary input port.

[Definition:
If a step has a document output port which is explicitly marked
“primary='true'”, or if it has exactly one
document output port and that port is not explicitly
marked “primary='false'”, then that output
port is the primary output port of the
step.] If a step has a single output port and that port is
explicitly marked “primary='false'”, or if a
step has more than one output port and none is explicitly marked as
the primary, then the primary output port of that step is
undefined. A step can have at most one primary output port.

The special significance of primary input and output ports is
that they are connected automatically by the processor if no
explicit binding is given. Generally speaking, if two steps appear
sequentially in a subpipeline, then the primary output of the first
step will automatically be connected to the primary input of the
second.

Additionally, if a compound step has no declared outputs and the
last step in
its subpipeline has an unbound primary output, then an implicit
primary output port will be added to the compound step (and
consequently the last step's primary output will be bound to it).
This implicit output port has no name. It inherits the sequence property of the port bound to it.

2.4 Connections

Steps are connected together by their input ports and output
ports. It is a
static
error (err:XS0001) if there are any loops in the
connections between steps: no step can be connected to itself nor
can there be any sequence of connections through other steps that
leads back to itself.

2.4.1 Namespace Fixup on Outputs

XProc processors are expected, and sometimes required, to
perform namespace fixup. Unless the
semantics of a step explicitly says otherwise:

The in-scope namespaces associated with a node (even those that
are inherited from namespace bindings that appear among its
ancestors in the document in which it appears initially) are
assumed to travel with it.

Changes to one part of a tree (wrapping or unwrapping a node or
renaming an element, for example) do not change the in-scope
namespaces associated with the descendants of the node so
changed.

As a result, some steps can produce XML documents which have no
direct serialization (because they include nodes with conflicting
or missing namespace declarations, for example). [Definition: To produce a
serializable XML
document, the XProc processor must sometimes add additional
namespace nodes, perhaps even renaming prefixes, to satisfy the
constraints of Namespaces in XML. This process is
referred to as namespace
fixup.]

Implementors are encouraged to perform namespace fixup
before passing documents between steps, but they are not required
to do so. Conversely, an implementation which does
serialize between steps and therefore must perform such fixups, or
reject documents that cannot be serialized, is also conformant.

Except where the semantics of a step explicitly require changes,
processors are required to preserve the information in the
documents and fragments they manipulate. In particular, the
information corresponding to the [Infoset]
properties [attributes], [base URI], [children],
[local name], [namespace name], [normalized
value], [owner], and [parent]must be
preserved.

The information corresponding to [prefix], [in-scope
namespaces], [namespace attributes],
and [attribute type]should be preserved, with changes to the first
three only as required for namespace fixup. In particular,
processors are encouraged to take account of prefix information in
creating new namespace bindings, to minimize negative impact on
prefixed names in content.

Whenever an implementation serializes pipeline contents, for
example for pipeline outputs, logging, or as part of steps such as
p:store or
p:http-request, it is a dynamic error if that serialization could not be
done so as to produce a document which is both well-formed and
namespace-well-formed, as specified in XML and Namespaces in
XML, regardless of what serialization method, if any, is
called for.

2.5 Environment

[Definition: The
environment is a context-dependent
collection of information available withing sub-pipelines.]
Most of the information in the environment is static and can be
computed for each subpipeline before evaluation of the pipeline as
a whole begins. The in-scope bindings have to be calculated as the
pipeline is being evaluated.

The environment consists of:

A set of readable ports. [Definition: The readable
ports are a set of step name/port name pairs.] Inputs
and outputs can only be connected to readable ports.

A default readable port. [Definition: The default readable port, which may be undefined, is
a specific step name/port name pair from the set of readable
ports.]

A set of in-scope bindings. [Definition: The in-scope bindings are a set of name-value pairs,
based on option
and variable
bindings.]

The names and values from each p:variable present at the beginning of the
container are added, in document order, to the in-scope
bindings. A new binding replaces an old binding with the
same name. See Section 5.7.1, “p:variable” for the
specification of variable evaluation.

2.6 XPaths
in XProc

XProc uses XPath as an expression language. XPath expressions
are evaluated by the XProc processor in several places: on compound
steps, to compute the default values of options and the values of
variables; on atomic steps, to compute the actual values of options
and the values of parameters.

XPath expressions are also passed to some steps. These
expressions are evaluated by the implementations of the individual
steps.

The href option of the p:load step is evaluated
by the XProc processor. The actual href
option received by the step is simply the string literal
“http://example.com/docs/document.xml”. (The
selection on the source input of the
select-chapters step is also evaluated by
the XProc processor.)

The XPath expression “@role='chapter'”
is passed literally to the test option on
the p:split-sequence step. That's because the
nature of the p:split-sequence is that it
evaluates the expression. Only some options on some steps
expect XPath expressions.

The XProc processor evaluates all of the XPath expressions in
select attributes on variables,
options, parameters, and inputs, in match attributes on p:viewport, and in
test attributes on p:when steps.

An XProc implementation can use either [XPath 1.0] or [XPath 2.0] to evaluate
these expressions. This is a compromise driven entirely by the
timing of XProc development. During the development of this
specification, the community indicated that it was too early to
mandate that all implementations use XPath 2.0 and too late to
mandate that all implementations use XPath 1.0.

Many, many expressions that are likely to be used in XProc
pipelines are the same in both versions (simple element tests,
ancestor and descendant tests, string-based attribute tests,
etc.).

As an aid to interoperability, pipeline authors may indicate the
version of XPath that they are using. The attribute xpath-version may be used on p:pipeline, p:declare-step,
(or p:library) to identify the XPath version
that should be used to evaluate XPath
expressions on the pipeline(s). The attribute is lexically scoped,
but see below.

If an xpath-version is specified
on a p:pipeline or p:declare-step,
then that is the version of XPath that the step uses. If it does
not specify a version, but a version is specified on one of its
ancestors, the nearest ancestor version specified is the version
that it uses. If no version is
specified on the step or among its ancestors, then its XPath
version is implementation-defined.

Note

The decision about which XPath version applies can be made
dynamically. For example, if a pipeline explicitly labeled with
xpath-version “1.0” imports a
library that does not specify a version, the implementation may
elect to make the implementation-defined XPath version of the steps
in the library also “1.0”. If the same implementation imports that
library into a pipeline explicitly labled with xpath-version “2.0”, it can make the
implementation-defined version of those steps “2.0”.

The following rules determine how the indicated version and the
implementation's actual version interact:

If the indicated version and the implementation version are the
same, then that version is used.

If the indicated version is 1.0 and the implementation uses
XPath 2.0 (or later), the expression must be evaluated in XPath 1.0 compatibility mode.
It is a
static
error (err:XS0046) if the processor does not support
XPath 1.0 compatibility mode.

If the indicated version is 2.0 (or later) and the
implementation uses XPath 1.0, the implementation must not evaluate any expression that it cannot
determine will give the same result in XPath 1.0 that it would have
given if XPath 2.0 had been used. It is a static error (err:XS0047) if the
processor cannot determine that the expression would yield the same
result.

2.6.1 Processor XPath Context

When the XProc processor evaluates an XPath expression using
XPath 1.0, unless otherwise indicated by a particular step, it does
so with the following context:

Implementation defined but must
include the Unicode codepoint collation. The version of Unicode supported is implementation-defined, but
it is recommended that the most recent version of Unicode be
used.

The set of namespace bindings provided by the XProc processor.
The processor computes this set of bindings by taking a union of
the bindings on the step element itself as well as the bindings on
any of the options and parameters used in computing values for the
step (see Section 5.7.5,
“Namespaces on variables, options, and parameters”).

The results of computing the
union of namespaces in the presence of conflicting declarations for
a particular prefix are implementation-dependent.

When a step evaluates an XPath expression using XPath 2.0,
unless otherwise indicated by a particular step, it does so with
the following static context:

XPath 1.0 compatibility mode

Is true if the indicated XPath version is 1.0, false
otherwise.

Statically known namespaces

The namespace declarations in-scope for the containing element
or made available through p:namespaces.

2.6.3 XPath Extension
Functions

The XProc processor must support a
few additional functions in XPath expressions evaluated by the
processor.

In the following descriptions, the names of types (string, boolean, etc.) should be
taken to mean the corresponding [W3C XML Schema: Part
2] data types for an implementation that uses XPath 2.0
and as the most appropriate XPath 1.0 types for an XPath 1.0
implementation.

2.6.3.1 System Properties

XPath expressions within a pipeline document can interrogate the
processor for information about the current state of the pipeline.
Various aspects of the processor are exposed through the p:system-property function in the pipeline
namespace:

The property string must have the form
of a QName; the QName is expanded into a name using the namespace
declarations in scope for the expression. The p:system-property function returns the string
representing the value of the system property identified by the
QName. If there is no such property, the empty string must be returned.

Implementations must provide the
following system properties, which are all in the XProc
namespace:

p:episode

Returns a string which should be
unique for each invocation of the pipeline processor. In other
words, if a processor is run several times in succession, or if
several processors are running simultaneously, each invocation of
each processor should get a distinct value from p:episode.

The unique identifier must consist of alphanumeric characters
and must start with an alphabetic character. Thus, the string is
syntactically an XML name.

p:language

Returns a string which identifies the current language, for
example, for message localization purposes. The exact format of the language string is implementation
defined but should be the
same as the xml:lang attribute.

p:product-name

Returns a string containing the name of the implementation, as
defined by the implementer. This should normally remain constant
from one release of the product to the next. It should also be
constant across platforms in cases where the same source code is
used to produce compatible products for multiple execution
platforms.

p:product-version

Returns a string identifying the version of the implementation,
as defined by the implementer. This should normally vary from one
release of the product to the next, and at the discretion of the
implementer it may also vary across different execution
platforms.

p:vendor

Returns a string which identifies the vendor of the
processor.

p:vendor-uri

Returns a URI which identifies the vendor of the processor.
Often, this is the URI of the vendor's web site.

p:version

Returns the version of XProc implemented by the processor; for
processors implementing the version of XProc specified by this
document, the value is “1.0”. The value of the version attribute is
a token (i.e., an xs:token per [W3C XML Schema: Part
2]).

p:xpath-version

Returns the version of XPath implemented by the processor for
evaluating XPath expressions on XProc elements.

2.6.3.2 Step Available

The p:step-available function reports whether or
not a particular type of step is understood by the processor.

The step-type string must have the form
of a QName; the QName is expanded into a name using the namespace
declarations in scope for the expression. The p:step-available
function returns true if and only if the processor knows how to
evaluate steps of the specified type.

2.6.3.3 Iteration Position

In the context of a p:for-each or a p:viewport, the
p:iteration-position function reports the
position of the document being processed in the sequence of
documents that will be processed. In the context of other standard
XProc compound steps, it returns 1.

2.6.3.4 Iteration Size

In the context of a p:for-each or a p:viewport, the
p:iteration-size function reports the number of
documents in the sequence of documents that will be processed. In
the context of other standard XProc compound steps, it returns
1.

2.6.3.7 Other XPath
Extension Functions

2.7 Variables

[Definition: A variable is a name/value pair where the name is an
expanded
name and the value must be a
string.] If a document, node, or other value is given, its
XPath string value is computed and that string is used.

Variables and options share the same scope and may shadow each
other.

2.8 Options

Some steps accept options. Options are name/value pairs, like
variables. Unlike variables, the value of an option can be changed
by the caller.

[Definition: An option is a name/value pair where the name is an
expanded
name and the value must be a
string.] If a document, node, or other value is given, its
XPath string value is computed and that string is used.

[Definition: The
options on a step which have specified values, either because a
p:with-option element specifies a value or
because the declaration included a default value, are its
specified options.]

2.9 Parameters

Some steps accept parameters. Parameters are name/value pairs,
like variables and options. Unlike variables and options, which
have names known in advance to the pipeline, parameters are not
declared and their names may be unknown to the pipeline author.
Pipelines can dynamically construct sets of parameters. Steps can
read dynamically constructed sets on parameter input ports.

[Definition: A
parameter is a name/value pair where the
name is an expanded name
and the value must be a
string.] If a document, node, or other value is given, its
XPath string value is computed and that string is used.

Analogous to primary input ports, steps that
have parameter inputs may designate at most one parameter input
port as a primary parameter input port.

[Definition: If a step has a
parameter input port which is explicitly marked “primary='true'”, or if it has exactly one parameter
input port and that port is not explicitly marked
“primary='false'”, then that parameter input
port is the primary parameter input port
of the step.] If a step has a single parameter input port
and that port is explicitly marked “primary='false'”, or if a step has more than one
parameter input port and none is explicitly marked as the primary,
then the primary parameter input port of that step is
undefined.

How an implementation maps
parameters specified to the application, or through some API, to
parameters accepted by the top level pipeline is implementation-defined.

2.10 Security
Considerations

An XProc pipeline may attempt to access arbitrary network
resources: steps such as p:load and p:http-request
can attempt to read from an arbitrary URI; steps such as p:store can attempt to
write to an arbitrary location; p:exec can attempt to execute an arbitrary
program. Note, also, that some steps, such as p:xslt and p:xquery, include
extension mechanisms which may attempt to execute arbitrary
code.

In some environments, it may be inappropriate to provide the
XProc pipeline with access to these resources. In a server
environment, for example, it may be impractical to allow pipelines
to store data. In environments where the pipeline cannot be
trusted, allowing the pipeline to access arbitrary resources or
execute arbitrary code may be a security risk.

It is a
dynamic
error (err:XD0021) for a pipeline to attempt to
access a resource for which it has insufficient privileges or
perform a step which is forbidden.

A conformant XProc processor may limit the resources available
to any or all steps in a pipeline. A conformant implementation may
raise dynamic errors, or take any other corrective action, for any
security problems that it detects.

2.11 Versioning
Considerations

A pipeline author may identify the version of XProc against
which a particular pipeline was authored by explicitly importing
the library that identifies the steps defined by that version of
XProc. For the version defined by this specification, the library
is “http://www.w3.org/2008/xproc-1.0.xpl”.

If the version is not explicitly identified, the implicit
version should be the most recent
version known to the processor.

When a processor encounters a version it does not recognize, it
proceeds in forwards-compatible mode. In forwards-compatible
mode:

The library that identifies the version of XProc is imported,
see p:import.
This provides the processor with declarations for any new step
types.

It is a dynamic error to attempt to evaluate a step type for
which no implementation is known, but conditional processing and
the step-available function can be used
to write backwards-compatible pipelines.

It is a static error if the signature of a known step in the
version library has changed, except for new options.

New options on known steps are ignored in the pipeline.

As a consequence, future specifications must not change the semantics of existing step
types without changing their names.

3 Syntax Overview

This section describes the normative XML syntax of XProc. This
syntax is sufficient to represent all the aspects of a pipeline, as
set out in the preceding sections. [Definition: XProc is intended to work equally well with
[XML 1.0] and
[XML 1.1]. Unless
otherwise noted, the term “XML” refers
equally to both versions.][Definition: Unless otherwise noted, the
term Namespaces in XML refers equally to
[Namespaces 1.0]
and [Namespaces
1.1].]Support
for pipeline documents written in XML 1.1 and pipeline inputs and
outputs that use XML 1.1 is implementation-defined.

Elements in a pipeline document represent the pipeline, the
steps it contains, the connections between those steps, the steps
and connections contained within them, and so on. Each step is
represented by an element; a combination of elements and attributes
specify how the inputs and outputs of each step are connected and
how options and parameters are passed.

Conceptually, we can speak of steps as objects that have inputs
and outputs, that are connected together and which may contain
additional steps. Syntactically, we need a mechanism for specifying
these relationships.

Containment is represented naturally using
nesting of XML elements. If a particular element identifies a
compound
step then the step elements that are its immediate
children form its subpipeline.

The connections between steps are expressed using names and
references to those names.

Six kinds of things are named in XProc:

Step types,

Steps,

Input ports (both parameter and document),

Output ports,

Options and variables, and

Parameters

3.1 XProc
Namespaces

There are four namespaces associated with XProc:

http://www.w3.org/ns/xproc

The namespace of the XProc XML vocabulary described by this
specification; by convention, the namespace prefix “p:” is used for this namespace.

http://www.w3.org/ns/xproc-step

The namespace used for documents that are inputs to and outputs
from several standard and optional steps described in this
specification. Some steps, such as p:http-request
and p:store,
have defined input or output vocabularies. We use this namespace
for all of those documents. The conventional prefix “c:” is used for this namespace.

http://www.w3.org/ns/xproc-error

The namespace used for errors. The conventional prefix
“err:” is used for this namespace.

3.2 Scoping of
Names

Names are used to identify step types, steps, ports, options and
variables, and parameters. Step types, options, variables, and
parameters are named with QNames. Steps and ports are named with
NCNames. The scope of a name is a measure of where it is available
in a pipeline. [Definition:
If two names are in the same scope, we say that they are visible to each other. ]

The scope of the names of the step types is the union of all the
pipelines and pipeline libraries available directly or via p:import. In other
words, the step types visible in a pipeline or library are:

For a pipeline in a library, the types visible in the containing
library.

All the
step types in a pipeline must have
unique names: it is a static error (err:XS0036) if any
step type name is built-in and/or declared or defined more than
once in the same scope.

The scope of the names of the steps themselves is determined by
the environment of each step. In general,
the name of a step, the names of its sibling steps, the names of
any steps that it contains directly, the names of its ancestors,
and the names of the siblings of its ancestors are all in a common
scope. All
steps in the same scope must have
unique names: it is a static error (err:XS0002) if two
steps with the same name appear in the same scope.

The scope of an input or output port name is the step on which
it is defined. The names of all the ports on any step must be unique.

Taken together, these uniqueness constraints guarantee that the
combination of a step name and a port name uniquely identifies
exactly one port on exactly one in-scope step.

The scope of option and variable names is determined by where
they are declared. When an option is declared with p:option (or a
variable with p:variable), unless otherwise specified, its
scope consists of the sibling elements that follow its declaration
and the descendants of those siblings.

Parameter names are not scoped; they are distinct on each
step.

3.3 Base URIs and xml:base

When a relative URI appears in an option value, the base URI
against which it must be made absolute
is the base URI of the p:option element. If an option value is
specified using a syntactic
shortcut, the base URI of the step on which the shortcut
attribute appears must be used. In
general, whenever a relative URI appears, its base URI is the base
URI of the nearest ancestor element.

The pipeline author can control the base URIs of elements within
the pipeline document with the xml:base attribute. The xml:base attribute may appear on any element in a pipeline and has
the semantics outlined in [XML Base].

3.4 Unique identifiers

A pipeline author can provide a globally unique identifier for
any element in a pipeline with the xml:id attribute.

The xml:id attribute may appear on any element in a pipeline and has
the semantics outlined in [xml:id].

It is a
dynamic
error (err:XD0002) if the processor attempts to
retrieve the URI specified on a p:document and fails. (For example, if the
resource does not exist or is not accessible with the user's
authentication credentials.)

Specified by source

[Definition: A document
is specified by source if it references
a specific port on another step.] The step and port
attributes on the p:pipe element are used for this
purpose.

In this example, the “source” input to
the p:xinclude step named “expand” comes from the “result” port of the step named “otherstep”.

Inline documents are considered “quoted”. The pipeline processor
passes them literally to the port, even if they contain elements
from the XProc namespace or other namespaces that would have other
semantics outside of the p:inline.

Specified explicitly empty

[Definition: An
empty sequence of documents is specified
with the p:empty element.]

In this example, the “source” input to
the XSLT 2.0 step named “generate” is
explicitly empty:

If you omit the binding on a primary input port, a binding to
the default readable port will be
assumed. Making the binding explicitly empty guarantees that the
binding will be to an empty sequence of documents.

It is inconsistent with the [XPath 1.0] specification
to specify an empty binding as the context for evaluating an XPath
expression. When an empty binding is specified for an XPath 1.0
expression, an empty document node must be used instead as the context node.

Note that a p:input or p:output element may contain more than one
p:pipe, p:document, or
p:inline
element. If more than one binding is provided, then the specified
sequence of documents is made available on that port in the same
order as the bindings.

3.6 Documentation

Pipeline authors may add documentation to their pipeline
documents with the p:documentation element. Except when it
appears as a descendant of p:inline, the p:documentation
element is completely ignored by pipeline processors, it exists
simply for documentation purposes. (If a p:documentation
is provided as a descendant of p:inline, it has no special semantics, it is
treated literally as part of the document to be provided on that
port.)

Pipeline processors that inspect the contents of p:documentation
elements and behave differently on the basis of what they find are
not conformant. Processor extensions must be specified with p:pipeinfo.

Where p:documentation is intended for human
consumption, p:pipeinfo elements are intended for
processor consumption. A processor might, for example, use
annotations to identify some particular aspect of an
implementation, to request additional, perhaps non-standard
features, to describe parallelism constraints, etc.

When a p:pipeinfo appears as a descendant of
p:inline, it
has no special semantics; in that context it must be treated literally as part of the document
to be provided on that port.

3.8 Extension attributes

[Definition:
An element from the XProc namespace may have any attribute not from the XProc
namespace, provided that the expanded-QName of the attribute has a
non-null namespace URI. Such an attribute is called an extension attribute.]

The presence of an extension attribute must not cause the
connections between steps to differ from the connections that would
arise in the absence of the attribute. They must not cause the
processor to fail to signal an error that would be signalled in the
absence of the attribute.

A processor which encounters an extension attribute that it does
not recognize must behave as if the
attribute was not present.

3.9 Syntax Summaries

The description of each element in the pipeline namespace is
accompanied by a syntactic summary that provides a quick overview
of the element's syntax:

QName: With whitespace normalization as
per [W3C XML Schema:
Part 2] and according to the following definition:
[Definition: In the context of
XProc, a QName is almost always a QName
in the Namespaces in XML sense. Note,
however, that p:option and p:with-param
values can get their namespace declarations in a non-standard way
(with p:namespaces) and QNames that have no prefix
are always in no-namespace, irrespective of the default
namespace.]

PrefixList: As a list with [item type]NMTOKEN, per
[W3C XML Schema: Part
2], including whitespace normalization.

XPathExpression, XSLTMatchPattern: As a string per [W3C XML Schema: Part
2], including whitespace normalization, and the further
requirement to be a conformant Expression per [XPath 1.0] or [XPath 2.0], as
appropriate, or Match pattern per [XSLT 1.0] or [XSLT 2.0], as appropriate.

If an XProc processor can determine statically that a dynamic
error will always occur, it may report that error statically provided that the
error does not occur among the descendants of a p:try. Errors inside a
p:try must always
be raised dynamically so that p:catch processing may be performed on
them.

4 Steps

This section describes the core steps of XProc.

4.1 p:pipeline

A p:pipeline declares a pipeline
that can be evaluated by an XProc processor. It encapsulates the
behavior of a subpipeline. Its children declare
inputs, outputs, and options that the pipeline exposes and identify
the steps in its subpipeline. (A p:pipeline is a simplified form of step declaration.)

All p:pipeline pipelines have an
implicit primary input port named
“source” and an implicit primary output
port named “result”. Any input or
output ports that the p:pipeline
declares explicitly are in addition to those ports and may
not be declared primary.

Viewed from the outside, a p:pipeline is a black box which performs some
calculation on its inputs and produces its outputs. From the
pipeline author's perspective, the computation performed by the
pipeline is described in terms of contained steps which read the
pipeline's inputs and produce the pipeline's outputs.

If a pipeline does not have a type then that pipeline cannot be invoked as a
step.

The p:pipeline element is just a
simplified form of step declaration. A document that reads:

When a pipeline needs to process a sequence of documents using a
subpipeline that only processes a single document, the p:for-each construct can be used as a wrapper
around that subpipeline. The p:for-each will apply that subpipeline to each
document in the sequence in turn.

The result of the p:for-each is a
sequence of documents produced by processing each individual
document in the input sequence. If the p:for-each has one or more output ports, what
appears on each of those ports is the sequence of documents that is
the concatenation of the sequence produced by each iteration of the
loop on the port to which it is connected. If the iteration source
for a p:for-each is an empty sequence,
then the subpipeline is never run and an empty sequence is produced
on all of the outputs.

The processor provides each document, one at a time, to the
subpipeline represented by the children
of the p:for-each on a port named
current.

For each declared output, the processor collects all the
documents that are produced for that output from all the
iterations, in order, into a sequence. The result of the p:for-each on that output is that sequence of
documents.

In the case where no XPath expression that must be evaluated by
the processor makes any reference to p:iteration-size,
its value does not actually have to be calculated (and the entire
input sequence does not, therefore, need to be buffered so that its
size can be calculated before processing begins).

4.2.2 Example

A p:for-each might accept a sequence of
chapters as its input, process each chapter in turn with XSLT, a
step that accepts only a single input document, and produce a
sequence of formatted chapters as its output.

The //chapter elements of the document are
selected. Each chapter is transformed into HTML and XSL Formatting
Objects using an XSLT step. The resulting HTML and FO documents are
aggregated together and appear on the html-results and fo-results
ports, respectively, of the chapters step
itself.

4.3 p:viewport

A viewport is specified by the p:viewport element. It is a compound step that
processes a single document, applying its subpipeline to one
or more subtrees of the document.

The match attribute specifies an
XSLT match pattern. Each matching node in the source document is
wrapped in a document node, as necessary, and provided, one at a
time, to the viewport's subpipeline on a port named current. The base URI of the resulting document that is
passed to the subpipeline is the base URI of the matched element or
document. It
is a dynamic
error (err:XD0010) if the match expression on p:viewport does not match an element or
document.

After a match is found, the entire subtree rooted at that match
is processed as a unit. No further attempts are made to match nodes
among the descendants of any matched node.

What appears on the output from the p:viewport will be a copy of the input document
where each matching node is replaced by the result of applying the
subpipeline to the subtree rooted at that node. In other words, if
the match pattern matches a particular element then that element is
wrapped in a document node and provided on the current port, the subpipeline in the p:viewport is evaluated, and the result that
appears on the output port replaces the
matched element.

If no documents appear on the output port,
the matched element will effectively be deleted. If exactly one
document appears, the contents of that document will replace the
matched element. If a sequence of documents appears, then the
contents of each document in that sequence (in the order it appears
in the sequence) will replace the matched element.

The output of the p:viewport itself
is a single document that appears on a port named “result”. Note that the semantics of p:viewport are special. The output port in the p:viewport is used only to access the results of
the subpipeline. The output of the step itself appears on a port
with the fixed name “result” that is never
explicitly declared.

In the case where no XPath expression that must be evaluated by
the processor makes any reference to p:iteration-size,
its value does not actually have to be calculated (and the entire
input sequence does not, therefore, need to be buffered so that its
size can be calculated before processing begins).

4.3.2 Example

A p:viewport might accept an XHTML document as
its input, add an hr element at the
beginning of all div elements that
have the class value “chapter”, and return an XHTML document that
is the same as the original except for that change.

The nodes which match h:div[@class='chapter'] in the input document are
selected. An hr is inserted as the first
child of each h:div and the resulting version
replaces the original h:div. The result of
the whole step is a copy of the input document with a horizontal
rule as the first child of each selected h:div.

4.4 p:choose

A choose is specified by the p:choose element. It is a multi-container
step that selects exactly one of a list of alternative
subpipelines based on the evaluation of
XPath expressions.

A p:choose has no inputs. It
contains an arbitrary number of alternative subpipelines,
exactly one of which will be evaluated.

The list of alternative subpipelines consists of zero or more
subpipelines guarded by an XPath expression, followed optionally by
a single default subpipeline.

The p:choose considers each
subpipeline in turn and selects the first (and only the first)
subpipeline for which the guard expression evaluates to true in its
context. If there are no subpipelines for which the expression
evaluates to true, the default subpipeline, if it was specified, is
selected.

After a subpipeline is selected, it is evaluated
as if only it had been present.

The outputs of the p:choose are
taken from the outputs of the selected subpipeline. The
p:choose has the same number of
outputs as the selected subpipeline with the same names. If the
selected subpipeline has a primary output port, the port
with the same name on the p:choose is
also a primary output port.

In order to ensure that the output of the p:choose is consistent irrespective of the
subpipeline chosen, each subpipeline must
declare the same number of outputs with the same names and the same
settings with respect to sequences. If any of the subpipelines
specifies a primary output port, each
subpipeline must specify exactly the same output as primary.
It is a
static
error (err:XS0007) if two subpipelines in a
p:choose declare different
outputs.

4.4.1 p:xpath-context

A p:xpath-context element specifies
the context against which an XPath expression will be evaluated.
When it appears in a p:when, it specifies the context for that
p:when’s
test attribute. When it appears in
p:choose, it
specifies the default context for all of the p:when elements in that
p:choose.

Only one binding is allowed and it works the same way
that bindings work on a p:input. No select expression is allowed. It is a dynamic
error (err:XD0005) if the xpath-context is bound to a sequence of
documents.

In an XPath 1.0 implementation, if the context node is bound to
p:empty, or is
unbound and the default readable port is
undefined, an empty document
node is used instead as the context. In an XPath 2.0
implementation, the context item is undefined.

4.4.2 p:when

Each p:when branch of the p:choose has a
test attribute which must contain an XPath expression. That XPath
expression's effective boolean value is the guard for the
subpipeline contained within that
p:when.

The p:when can specify a context
node against which its test
expression is to be evaluated. That context node is specified as a
binding for
the p:xpath-context. If no context is specified
on the p:when, the context of the
p:choose is
used.

4.4.3 p:otherwise

An otherwise specifies the default branch; the subpipeline
selected if no test expression on any preceding p:when evaluates to
true.

4.6 p:try

A try/catch is specified by the p:try element. It is a multi-container
step that isolates a subpipeline, preventing any dynamic
errors that arise within it from being exposed to the rest of the
pipeline.

The p:group
represents the initial subpipeline and the recovery (or “catch”)
pipeline is identified with a p:catch element.

The p:try step evaluates the
initial subpipeline and, if no errors occur, the outputs of that
pipeline are the outputs of the p:try
step. However, if any errors occur, the p:try abandons the first subpipeline, discarding
any output that it might have generated, and evaluates the recovery
subpipeline.

If the recovery subpipeline is evaluated, the outputs of the
recovery subpipeline are the outputs of the p:try step. If the recovery subpipeline is
evaluated and a step within that subpipeline fails, the p:try fails.

The outputs of the p:try are taken
from the outputs of the initial subpipeline or the recovery
subpipeline if an error occurred in the initial subpipeline. The
p:try has the same number of outputs
as the selected subpipeline with the same names. If the selected
subpipeline has a primary output port, the port
with the same name on the p:try is
also a primary output port.

In order to ensure that the output of the p:try is consistent irrespective of whether the
initial subpipeline provides its output or the recovery subpipeline
does, both subpipelines must declare the same number of outputs
with the same names and the same settings with respect to
sequences. If either of the subpipelines specifies a primary output
port, both subpipelines must specify exactly the same
output as primary. It is a static error (err:XS0009) if the
p:group and
p:catch
subpipelines declare different outputs.

What appears on the error output port is
an error document. The error document may
contain messages generated by steps that were part of the initial
subpipeline. Not all messages that appear are indicative of errors;
for example, it is common for all xsl:message output from the XSLT component to
appear on the error output port. It is
possible that the component which fails may not produce any
messages at all. It is also possible that the failure of one
component may cause others to fail so that there may be multiple
failure messages in the document.

4.6.1 The Error
Vocabulary

In general, it is very difficult to predict error behavior. Step
failure may be catastrophic (programmer error), or it may be be the
result of user error, resource failures, etc. Steps may detect more
than one error, and the failure of one step may cause other steps
to fail as well.

The p:try/p:catch mechanism gives pipeline authors the
opportunity to process the errors that caused the p:try to fail. In order
to facilitate some modicum of interoperability among processors,
errors that are reported on the error
output port of a p:catchshould
conform to the format described here.

4.6.1.1 c:errors

The error vocabulary consists of a root element, c:errors which contains zero or more c:error elements.

The name and type attributes identify the name and type,
respectively, of the step which failed.

The code is a QName which
identifies the error. For steps which have defined error codes,
this is an opportunity for the step to identify the error in a
machine-processable fashion. Many steps omit this because they do
not include the concept of errors identified by QNames.

If the error was caused by a specific document, or by the
location of some erroneous construction in a specific document, the
href, line, column,
and offset attributes identify this
location. Generally, the error location is identified either with
line and column numbers or with an offset from the beginning of the
document, but not usually both.

The content of the c:error element
is any well-formed XML. Specific steps, or specific
implementations, may provide more detail about the format of the
content of an error message.

It is not an error for steps to generate non-standard error
output as long as it is well-formed.

4.6.2 Example

A pipeline might attempt to process a document by dispatching it
to some web service. If the web service succeeds, then those
results are passed to the rest of the pipeline. However, if the web
service cannot be contacted or reports an error, the p:catch step can
provide some sort of default for the rest of the pipeline.

4.7 Atomic Steps

In addition to six step types described in the preceding
sections, XProc provides a standard library of atomic step types.
The full vocabulary of standards steps is described in Section 7,
“Standard Step Library”.

Where “pfx:atomic” must be in the XProc namespace and must be declared in either the standard library
for the XProc version supported by the processor or explicitly
imported by the surrounding pipeline (see Section 2.11, “Versioning
Considerations”).

4.8 Extension
Steps

Pipeline authors may also have access to additional steps not
defined or described by this specification. Atomic extension steps
are invoked just like standard steps:

If the relevant step declaration has no subpipeline, then
that step invokes the declared atomic step, which the processor
must know how to perform. These steps are implementation-defined
extensions.

If the relevant step declaration has a subpipeline, then
that step runs the declared subpipeline. These steps are user- or
implementation-defined extensions. Pipelines can refer to
themselves (recursion is allowed), to pipelines defined in imported
libraries, and to other pipelines in the same library if they are
in a library.

4.8.1 Syntactic Shortcut for Option
Values

Namespace qualified attributes on a step are extension
attributes. Attributes, other than name, that are not namespace qualified are
treated as a syntactic shortcut for specifying the value of an
option. In other words, the following two steps are equivalent:

5 Other
pipeline elements

5.1 p:input

A p:input identifies an input port
for a step. In some contexts, p:input
declares that a port with the specified name exists and identifies
the properties of that port. In other contexts, it provides a
binding for a port declared elsewhere. And in some contexts, it
does both. The semantics of p:input
are complicated further by the fact that there are two kinds of
inputs, ordinary “document” inputs and “parameter” inputs.

5.1.1 Document Inputs

The declaration of a document input identifies the name of the
port, whether or not the port accepts a sequence, whether or not
the port is a primary input port, and may
provide a default binding for the port. An input declaration has
the following form:

The port attribute defines the
name of the port. It is a static error (err:XS0011) to
identify two ports with the same name on the same step.

The sequence attribute determines
whether or not a sequence of documents is allowed on the port.
If sequence is not specified, or has the value
“false”, then it is a dynamic error (err:XD0006) unless
exactly one document appears on the declared port.

The primary attribute is used to
identify the primary input port. An input port
is a primary
input port if primary is
specified with the value “true” or if the step has only a single
input port and primary is not
specified. It
is a static
error (err:XS0030) to specify that more than one
input port is the primary.

The kind attribute distinguishes
between the two kinds of inputs: document inputs and parameter
inputs. An input port is a document input port if kind is specified with the value “document” or if kind
is not specified.

If a default binding is provided, then select may be used to select a portion of the
input identified by the p:empty, p:document, or p:inline elements in
the p:input.

A select expression may also be provided with a binding. The
select expression, if specified,
applies the specified XPath select expression to the document(s)
that are read. Each selected node is wrapped in a document (unless
it is a document) and provided to the input port. In other
words,

provides a sequence of zero or more documents, one for each
html:div in http://example.org/input.html. (Note that in the case of
nested html:div elements, this may result in
the same content being returned in several documents.)

A select expression can equally be applied to input read from
another step. This input:

provides a sequence of zero or more documents, one for each
html:div in the document (or each of the
documents) that is read from the result
port of the step named origin.

It is a
dynamic
error (err:XD0016) if the select expression on a p:input returns
anything other than a possibly empty set of element or document
nodes.

When a p:input is used in any context where it
provides only a binding (e.g., on an atomic step), it is a static
error (err:XS0012) if the port given does not match the name of an input
port specified in the step's declaration.

An input declaration may include a default binding. If no
binding is provided for an input port which has a default binding,
then the input is treated as if the default binding appeared.

A default binding does not satisfy the requirement that a
primary input port is automatically connected by the processor, nor
is it used when no default readable port is defined. In other
words, a p:declare-step or a p:pipeline can
define defaults for all of its inputs, whether they are primary or
not, but defining a default for a primary input usually has no
effect. It's never used by an atomic step since the the step, when
it's called, will always bind the primary input port to the default
readable port (or cause a static error). The only case where it has
value is on a p:pipeline when that pipeline is invoked
directly by the processor. In that case, the processor must use the default binding if no external
binding is provided for the port.

5.1.2 Parameter Inputs

The declaration of a parameter input identifies the name of the
port and that the port is a parameter input.

The port attribute defines the
name of the port. It is a static error (err:XS0011) to
identify two ports with the same name on the same step.

The sequence attribute determines
whether or not a sequence of documents is allowed on the port. A
sequence of documents is always allowed on a parameter input port.
It is a
static
error (err:XS0040) to specify any value other than
“true”.

The kind attribute distinguishes
between the two kinds of inputs: document inputs and parameter
inputs. An input port is a parameter input port only if the
kind attribute is specified with the
value “parameter”. It is a static error (err:XS0033) to
specify any kind of input other than “document” or “parameter”.

A parameter input port is a distinguished kind of input port. It
exists only to receive computed parameters; if a step does not have
a parameter input port then it cannot receive parameters. A
parameter input port must satisfy all the constraints of a normal,
document input port.

It is a
static
error (err:XS0035) if the declaration of a parameter
input port contains a binding; parameter input port declarations
must be empty.

When used on a step, parameter input ports always accept a
sequence of documents. If no binding is provided for a primary
parameter input port, then the port will be bound to the
primary parameter input port of the pipeline which contains the
step. If no binding is provided for a parameter input port other
than the primary parameter input port, then the port will be bound
to an empty
sequence of documents. It is a static error (err:XS0055) if a
primary parameter input port has no binding and the pipeline that
contains the step has no primary parameter input port.

If a parameter input port on a p:pipeline is not bound, it is treated as if
it was bound to an automatically created p:sink step. In other
words, if a p:pipeline does not contain any steps that
have parameter input ports, or if those ports are all explicitly
bound elsewhere, the parameter input port is ignored. In this one
case, it is not an error for an input port to be unbound.

If a binding is manufactured for a primary parameter input port,
that binding occurs logically last among the other parameters,
options, and bindings passed to the step. In other words, the
parameter values that appear on that port will be used even if
other values were specified with p:with-param elements. Users can change this
priority by making the binding explicit and placing any p:with-param
elements that they wish to function as overrides after the
binding.

All of the documents that appear on a parameter input must
either be c:param documents or c:param-set
documents.

A step which accepts a parameter input reads all of the
documents presented on that port, using each c:param (either at the
root or inside the c:param-set) to establish the value of the
named parameter. If the same name appears more than once, the last
value specified is used. If the step also has literal p:with-param
elements, they are are also considered in document order. In other
words, p:with-param elements that appear before the
parameter input may be overridden by the computed parameters;
p:with-param elements that appear after may
override the computed values.

This p:pipeline declares that it accepts
parameters. Suppose that (through some implementation-defined
mechanism) I have passed the parameters “output-type=fo” and
“profile=unclassified” to the pipeline. These parameters are
available on the parameters input
port.

When the XSLT step runs, it will read those parameters and
combine them with any parameters specified literally on the step.
Because the parameter input comes after the literal
declaration for output-type on the step,
the XSLT stylesheet will see both values that I passed in
(“output-type=fo”
and “profile=unclassified”).

If the parameter input came before the literal
declaration, then the XSLT stylesheet would see “output-type=html” and
“profile=unclassified”.

Most steps don't bother to declare parameter inputs, or provide
explicit bindings for them, and “the right thing” usually
happens.

5.1.2.1 The c:param
element

A c:param represents a parameter on
a parameter input.

<c:paramname = QName
namespace? = anyURIvalue =
string />

The name attribute of the
c:param must have the lexical form of
a QName.

If the namespace attribute is
specified, then the expanded name of the parameter is constructed
from the specified namespace and the local-name part of the
name value (in other words, the
prefix, if any, is ignored).

If the namespace attribute is not
specified, and the name contains a
colon, then the expanded name of the parameter is constructed using
the name value and the namespace
declarations in-scope on the c:param
element.

If the namespace attribute is not
specified, and the name does not
contain a colon, then the expanded name of the parameter is in no
namespace.

Any namespace-qualified attribute names that appear on the
c:param element are ignored. It is a dynamic
error (err:XD0014) for any unqualified attribute
names other than “name”, “namespace”, or “value” to
appear on a c:param element.

5.3 p:viewport-source

Only one binding is allowed and it works the same way
that bindings work on a p:input. It is a dynamic error (err:XD0006) unless
exactly one document appears on the p:viewport-source. No select expression is allowed.

5.4 p:output

A p:output identifies an output
port, optionally binding an input for it, if necessary.

<p:outputport = NCName
sequence? = boolean
primary? = boolean />

The port attribute defines the
name of the port. It is a static error (err:XS0011) to
identify two ports with the same name on the same step.

An output declaration can indicate if a sequence of documents is
allowed to appear on the declared port. If sequence is specified with the value “true”,
then a sequence is allowed. If sequence
is not specified on p:output, or has
the value “false”, then it is a dynamic error (err:XD0007) if the
step does not produce exactly one document on the declared
port.

The primary attribute is used to
identify the primary output port. An output port is a primary
output port if primary is specified
with the value “true” or if the step has
only a single output port and primary is not specified. It is a static
error (err:XS0014) to identify more than one output
port as primary.

If a binding is provided for a p:output, documents are read from that
binding and those documents form the output that is
written to the output port. In other words, placing a p:document inside a
p:output causes the processor to
read that document and provide it on the output port. It
does not cause the processor to write the output
to that document.

5.5 p:log

A p:log element is a debugging aid.
It associates a URI with a specific output port on a step:

<p:logport = NCName
href? = anyURI />

The semantics of p:log are that it
writes to the specified URI whatever document or documents appear
on the specified port. If the
href attribute is not specified, the
location of the log file or files is implementation-defined.

If the pipeline processor serializes the output on the specified
port, it must use the serialization
options specified. If the processor is not serializing (if, for
example, the pipeline has been called from another pipeline), then
the p:serializationmust be ignored. The processor may reject statically a pipeline that requests
serialization options that it cannot provide.

It is a
static
error (err:XS0039) if the port specified on the
p:serialization is not the name of an
output port on the pipeline in which it appears or if more than one
p:serialization element is applied to
the same port.

5.7 Variables, Options, and
Parameters

Variables, options, and parameters provide a mechanism for
pipeline authors to construct temporary results and hold onto them
for reuse.

Variables are created in compound steps and, like XSLT
variables, are single assignment, though they may be shadowed by
subsequent declarations of other variables with the same name.

Options can be declared on atomic or compound steps. The value
of an option can be specified by the caller invoking the step. Any
value specified by the caller takes precedence over any default
value specified in the declaration.

Parameters, unlike options and variables, have names that can be
computed at runtime. The most common use of parameters is to pass
parameter values to XSLT stylesheets.

5.7.1 p:variable

A p:variable declares a variable
and associates a value with it.

The name of the variable must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error (err:XS0028) to declare an option or variable
in the XProc namespace.

The variable's value is specified with a select attribute. It is a static error (err:XS0016) if the
select attribute is not specified.
The content of the select attribute
is an XPath expression which will be evaluated to provide the value
of the variable.

If a select expression is given,
it is evaluated as an XPath expression using the context defined in
Section 2.6.1, “Processor XPath
Context”, for the enclosing container, with the addition of bindings
for all preceding-sibling p:variable
and p:option
elements. Regardless of the implicit type of the expression, when
XPath 1.0 is being used, the string value of the expression becomes
the value of the variable; when XPath 2.0 is being used, the type
is treated as an untypedAtomic.

Since all in-scope bindings are present in
the Processor XPath Context as variable bindings, select expressions may refer to the value of
in-scope
bindings by variable reference. If a variable reference
uses a QName that is not the name of an in-scope
binding, an XPath evaluation error will occur.

If a select expression is given
but no document binding is provided, the implicit binding is to the
default
readable port in the environment inherited by the first
step in the surrounding container's contained steps. It is a static
error (err:XS0032) if no document binding is
provided and the default readable port is undefined. It is a dynamic
error (err:XD0008) if a document sequence is
specified in the document binding for a p:variable. In an XPath 1.0 implementation, if
p:empty is
given as the document binding, an empty document node is used as the
context node. In an XPath 2.0 implementation, the context item is
undefined.

5.7.2 p:option

A p:option declares an option and
may associate a default value with it. The p:option tag can only be used in a p:declare-step
or a p:pipeline (which is a syntactic
abbreviation for a step declaration).

The name of the option must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error (err:XS0028) to declare an option or variable
in the XProc namespace.

If a select attribute is
specified, its content is an XPath expression which will be
evaluated to provide the value of the variable, which may differ
from one instance of the step type to another.

<p:optionname = QNameselect =
XPathExpression />

The select expression is only
evaluated when its actual value is needed by an instance of the
step type being declared. In this case, it is evaluated as
described in Section 5.7.3, “p:with-option” except
that

the variable bindings consist only of bindings for options whose
declaration precedes the p:option
itself in the surrounding step signature;

the in-scope namespaces are the in-scope namespaces of the
p:option itself.

It follows that if the select
expression contains a variable reference that uses a QName that is
not the name of an preceding sibling p:option declaration, an XPath evaluation error
will occur.

Regardless of the implicit type of the expression, when XPath
1.0 is being used, the string value of the expression becomes the
value of the option; when XPath 2.0 is being used, the value is an
untypedAtomic.

5.7.3 p:with-option

A p:with-option provides an actual
value for an option when a step is invoked.

The name of the option must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error (err:XS0031) to use an option name in
p:with-option if the step type being
invoked has not declared an option with that name.

It is a
static
error (err:XS0004) to include more than one
p:with-option with the same option
name as part of the same step invocation.

The actual value is specified with a select attribute. It is a static error (err:XS0016) if the
select attribute is not specified.
The value of the select attribute is
an XPath expression which will be evaluated to provide the value of
the variable.

Regardless of the implicit type of the expression, when XPath
1.0 is being used, the string value of the expression becomes the
value of the option; when XPath 2.0 is being used, the value is an
untypedAtomic.

All in-scope bindings for the step
instance itself are present in the Processor XPath Context as
variable bindings, so select
expressions may refer to any option or variable bound in those
in-scope
bindings by variable reference. If a variable reference
uses a QName that is not the name of an in-scope
binding or preceding sibling option, an XPath evaluation
error will occur.

If a select expression is used
but no document binding is provided, the implicit binding is to the
default
readable port. It is a static error (err:XS0032) if no
document binding is provided and the default readable port is
undefined. It is a dynamic error (err:XD0008) if a
document sequence is specified in the binding for a p:with-option. In an XPath 1.0 implementation,
if p:empty is
given as the document binding, an empty document node is used as the
context node. In an XPath 2.0 implementation, the context item is
undefined.

5.7.4 p:with-param

The p:with-param element is used to
establish the value of a parameter. The parameter must be given a value when it is used. (Parameter
names aren't known in advance; there's no provision for declaring
them.)

The name of the parameter must be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error (err:XS0028) to use the XProc namespace in the
name of a parameter.

The value is specified with a select attribute. It is a static error (err:XS0016) if the
select attribute is not specified.
The content of the select attribute
is an XPath expression which will be evaluated to provide the value
of the variable.

The values of parameters for a step must be computed after all the options in the
step's signature have had their values computed.
If a select expression is given on a
p:with-param, it is evaluated as an
XPath expression using the context defined in Section 2.6.1, “Processor XPath
Context”, for the surrounding step, with the addition of
variable bindings for all options declared in the surrounding
step's signature.

Regardless of the implicit type of the expression, when XPath
1.0 is being used, the string value of the expression becomes the
value of the parameter; when XPath 2.0 is being used, the value is
an untypedAtomic.

All in-scope bindings for the step
instance itself are present in the Processor XPath Context as
variable bindings, so select
expressions may refer to any option or variable bound in those
in-scope
bindings, as well as to any option declared in the step
signature, by variable reference. If a variable reference uses a
QName that is not the name of an in-scope binding or declared
option, an XPath evaluation error will occur.

If the optional port attribute is
specified, then the parameter appears on the named port, otherwise
the parameter appears on the step's primary parameter input
port. It is a static error (err:XS0034) if the
specified port is not a parameter input port or if no port is
specified and the step does not have a primary parameter input
port.

5.7.5 Namespaces on variables,
options, and parameters

Variable, option and parameter values carry with them not only
their literal or computed string value but also a set of
namespaces. To see why this is necessary, consider the following
step:

The p:delete step will delete elements that
match the expression “html:div”, but that
expression can only be correctly interpreted if there's a namespace
binding for the prefix “html” so that
binding has to travel with the option.

The default namespace bindings associated with a variable,
option or parameter value are computed as follows:

If the select attribute was used
to specify the value and it consisted of a single VariableReference (per [XPath 1.0] or [XPath 2.0], as
appropriate), then the namespace bindings from the referenced
option or variable are used.

If the select attribute was used
to specify the value and it evaluated to a node-set, then the
in-scope namespaces from the first node in the selected node-set
(or, if it's not an element, its parent) are used.

In this case, the match option passed
to the p:delete step needs both the
namespace binding of “h” specified in the
ex:delete-in-div pipeline definition
and the namespace binding of “html” specified in the divchild option on the call of that pipeline. It's
not sufficient to provide just one of the sets of bindings.

If the element attribute is
specified, it must contain an XPath
expression which identifies a single element node (the input
binding for this expression is the same as the binding for the
p:option or
p:with-param which contains it). The
in-scope namespaces of that node are used.

If neither binding nor element is specified, the in-scope namespaces
on the p:namespaces element itself are used.

Irrespective of how the set of namespaces are determined, the
except-prefixes attribute can be
used to exclude one or more namespaces. The value of the
except-prefixes attribute must be a
sequence of tokens, each of which must
be a prefix bound to a namespace in the in-scope namespaces of the
p:namespaces element. All bindings of
prefixes to each of the namespaces thus identified are excluded.
It is a
static
error (err:XS0051) if the except-prefixes attribute on p:namespaces does
not contain a list of tokens or if any of those tokens is not a
prefix bound to a namespace in the in-scope namespaces of the
p:namespaces element.

The p:namespaces element provides namespace
bindings for both of the prefixes necessary to correctly interpret
the expression ultimately passed to the p:delete step.

This solution has the weakness that it depends on knowing the
bindings that will be used by the caller. A more flexible solution
would use the binding attribute to
copy the bindings from the caller's option value.

The value of the type can be from
any namespace provided that the expanded-QName of the value has a
non-null namespace URI. It is a static error (err:XS0025) if the
expanded-QName value of the type
attribute is in no namespace. Except as described in Section 2.11, “Versioning
Considerations”, the XProc namespace must
not be used in the type of steps. Neither users nor
implementers may define additional steps in the XProc
namespace.

Irrespective of the context in which the p:declare-step occurs, there are initially no
option or variable names in-scope inside a p:declare-step. That is, p:option and p:variable elements
can refer to values declared by their preceding siblings, but not
by any of their ancestors.

When a declared step is
evaluated directly by the XProc processor (as opposed to occurring
as an atomic step in some container), how the input and output ports
are bound to documents is implementation-defined.

A step declaration is not a step in
its own right. Sibling steps cannot refer to the inputs or outputs
of a p:declare-step using p:pipe; only instances
of the type can be referenced.

5.8.1 Declaring atomic steps

When declaring an atomic step, the subpipeline in the
declaration must be empty. And,
conversely, if the subpipeline in a declaration is empty, the
declaration must be for an atomic
step.

It is not an error for a pipeline to include declarations for
steps that a particular processor does not know how to implement.
It is, of course, an error to attempt to evaluate such steps.

If p:log or
p:serialization elements appear in the
declaration of an atomic step, they will only be used if the atomic
step is directly evaluated by the processor. They have no effect if
the step appears in a subpipeline; only the serialization
options of the “top level” step or pipeline are used because that
is the only step which the processor is required to serialize.

5.8.2 Declaring pipelines

When a p:declare-step declares a pipeline, that
pipeline encapsulates the behavior of the specified subpipeline. Its
children declare inputs, outputs, and options that the pipeline
exposes and identify the steps in its subpipeline.

The subpipeline may include
declarations of additional steps (e.g., other pipelines or other
step types that are provided by a particular implementation or in
some implementation-defined way)
and import other pipelines. If a pipeline has been imported, it may
be invoked as a step within the subpipeline that imported it.

The psvi-required attribute
allows the author to declare that a step relies on the processor's
ability to pass PSVI annotations between steps. It is a static
error (err:XS0049) if the psvi-required attribute is “true” and the processor does not support passing
PSVI annotations between steps. If the attribute is not specified,
the value “false” is assumed.

5.9 p:library

A p:library is a collection of step
declarations and/or pipeline definitions.

Attempts to retrieve the library identified by the URI value may
be redirected at the parser level (for example, in an entity
resolver) or below (at the protocol level, for example, via an HTTP
Location: header). In the absence of additional information outside
the scope of this specification within the resource, the base URI
of the library is always the URI of the actual resource returned.
In other words, it is the URI of the resource retrieved after all
redirection has occurred.

As imports are processed, a processor may encounter new
p:import elements whose library URI is
the same as one it has already processed in some other context.
This may happen as a consequence of resolving the URI. If the
actual base URI is the same as one that has already been processed,
the implementation must recognize it as the same library and should
not need to process the resource. Also, a duplicate, circular chain
of imports, or a re-entrant import is not an error and
implementations must take the necessary steps to avoid infinite
loops and/or incorrect notification of duplicate step definitions.
An example of such steps are listed in
Appendix F, Handling Circular and Re-entrant Library
Imports (Non-Normative).

A library is considered the same library if the URI of the
resource retrieved is the same. If a pipeline or library author
uses two different URI values that resolve to the same resource,
they must not be considered the same imported library.

If the href URI, after being made
absolute, begins “http://www.w3.org/YYYY/xproc-”
(where YYYY is a four digit year), then it
identifies the steps defined by a particular version of XProc. See
Section 2.11, “Versioning
Considerations”.

If the value is recognized by the processor, that is, if the
processor understands the specified version, then the processor
should not retrieve the library. It
simply serves to identify the set of XProc steps that can be used
without declaration.

If the value is not recognized by the processor, then it
must be retrieved and must point to a p:library. In such a library, and only in
such a library, additional steps in the XProc namespace may be
declared.

It is a
static
error (err:XS0050) if a pipeline attempts to import
two (or more) libraries with URIs that identify steps associated
with a particular version of XProc.

5.11 p:pipe

A p:pipe connects an input to a
port on another step.

<p:pipestep = NCNameport =
NCName />

The p:pipe element connects to a
readable port of another step. It identifies the readable port to
which it connects with the name of the step in the step attribute and the name of the port on
that step in the port attribute.

A p:pipe that is a binding for an p:output of a
compound
step may connect to one of the readable ports of the
compound step or to an output port on one of the compound step's
contained
steps. In other words, the output of a compound step can
simply be a copy of one of the available inputs or it can be the
output of one of its children.

5.12 p:inline

The content of the p:inline element
is wrapped in a document node and passed as input. The base URI of
the document is the base URI of the p:inline element. It is a static error (err:XS0024) if the
content of the p:inline element does
not consist of exactly one element, optionally preceded and/or
followed by any number of processing instructions, comments or
whitespace characters.

The in-scope namespaces of the inline document differ from the
in-scope namespace of the content of the p:inline element in that bindings for all its
excluded namespaces, as defined below, are removed:

The XProc namespace itself (http://www.w3.org/ns/xproc) is excluded.

A namespace URI designated by using an exclude-inline-prefixes attribute on the
enclosing p:inline is excluded.

The value of the attribute is either #all, or a whitespace-separated list of tokens, each
of which is either a namespace prefix or #default. The namespace bound to each of the
prefixes is designated as an excluded namespace.

It is a
static
error (err:XS0057) if a namespace prefix is used
within the exclude-result-prefixes
attribute and there is no namespace binding in scope for that
prefix.

The default namespace of the p:inline may be designated as an excluded
namespace by including #default in the
list of namespace prefixes.

It is a
static
error (err:XS0058) if the value #default is used within the exclude-result-prefixes attribute and the
p:inline has no default namespace.

The value #all indicates that all
namespaces that are in scope for the p:inline are designated as excluded
namespaces.

5.13 p:document

A p:document reads an XML document
from a URI.

<p:documenthref =
anyURI />

The document identified by the URI in the href attribute is loaded and returned.

It is a
dynamic
error (err:XD0011) if the document referenced by a
p:document element does not exist,
cannot be accessed, or is not a well-formed XML document.

The parser which the p:document
element employs must be conformant to
Namespaces in XML. It must not fail on well-formed, standalone XML. The
external subset should not be
processed, but some parsers do not provide that option. It is implementation-defined
whether or not the external subset is processed. Loading the
document must not fail due to
validation errors. It must not perform
any other processing, such as expanding XIncludes.

Use the p:load step if you need to perform DTD-based
validation or wish to perform other processing on the document
before it is used by a step.

6 Errors

Errors in a pipeline can be divided into two classes: static
errors and dynamic errors.

6.1 Static
Errors

[Definition: A
static error is one which can be
detected before pipeline evaluation is even attempted.]
Examples of static errors include cycles, incorrect specification
of inputs and outputs, and reference to unknown steps.

Static errors are fatal and must be detected before any steps
are evaluated.

Static Errors

err:XS0001

It is a static error if there are any loops in the connections
between steps: no step can be connected to itself nor can there be
any sequence of connections through other steps that leads back to
itself.

It is a static error if the content of the p:inline element does
not consist of exactly one element, optionally preceded and/or
followed by any number of processing instructions, comments or
whitespace characters.

It is a static error if the port specified on the
p:serialization is not the name of an output port on the pipeline
in which it appears or if more than one p:serialization element is
applied to the same port.

It is a static error if the except-prefixes attribute on
p:namespaces does not contain a list of tokens or if any of those
tokens is not a prefix bound to a namespace in the in-scope
namespaces of the p:namespaces element.

6.2 Dynamic Errors

A [Definition: A
dynamic error is one which occurs while
a pipeline is being evaluated.] Examples of dynamic errors
include references to URIs that cannot be resolved, steps which
fail, and pipelines that exhaust the capacity of an implementation
(such as memory or disk space).

If a step fails due to a dynamic error, failure propagates
upwards until either a p:try is encountered or the entire pipeline
fails. In other words, outside of a p:try, step failure causes the entire
pipeline to fail.

Dynamic Errors

err:XD0001

It is a dynamic error if a non-XML resource is produced on a
step output or arrives on a step input.

It is a dynamic error if the value supplied for any option
specified for any step in this section is not of the type mandated
in the step description, with phrases such as "The value of the
xxx-name option must be a QName" or "the value of the yyy-flag
option must be a boolean".

it is a dynamic error if the content of the c:body element does
not consist of exactly one element, optionally preceded and/or
followed by any number of processing instructions, comments or
whitespace characters

7 Standard Step Library

This appendix describes the standard XProc steps. A
machine-readable description of these steps may be found in
pipeline-library.xml.

Some steps in this appendix consume or produce an XML vocabulary
defined in this section. In all cases, the namespace for that
vocabulary is http://www.w3.org/ns/xproc-step
and is represented by the prefix 'c:' in this appendix.

When a step in this library produces an output document, the
base URI of the output is the base URI of the step's primary input
document unless the step's process explicitly sets an xml:base attribute or the step's description
explicitly states how the base URI is constructed.

Also, in this section, several steps use this
element for result information:

It is a
dynamic
error (err:XC0016) if the value supplied for any
option specified for any step in this section is not of the type
mandated in the step description, with phrases such as "The value
of the xxx-name option must be a QName" or "the value of the yyy-flag option must be
a boolean".

7.1 Required
Steps

This section describes standard steps that must be supported by
any conforming processor.

7.1.1 p:add-attribute

The p:add-attribute step adds a single
attribute to a set of matching elements. The input document
specified on the source is processed for
matches specified by the match pattern in the match option. For each of these matches, the
attribute whose name is specified by the attribute-name option is set to the attribute value
specified by the attribute-value
option.

The resulting document is produced on the result output port and consists of a exact copy of the
input with the exception of the matched elements. Each of the match
elements is copied to the output with the addition or change of the
specified attribute name.

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0013) if the match pattern does not
match an element.

The value of the attribute-name option
must be a QName. The corresponding
expanded name is used to construct the added attribute.

The value of the attribute-value option
must be a legal attribute value
according to XML.

If an attribute with the same name as the expanded name from the
attribute-name option exist on the matched
element, the value specified in the attribute-value option is used to set the value of
that existing attribute. That is, the value of the existing
attribute is changed to the attribute-value
value.

Note

If multiple attributes need to be set, the p:set-attributes step should be used.

This step cannot be used to add namespace declarations. It is a dynamic
error (err:XC0059) if the QName value in the
attribute-name option uses the prefix
'xmlns' or any other prefix that resolves to the same namespace
name as 'xmlns'.

7.1.2 p:add-xml-base

The p:add-xml-base step exposes the base
URI via explicit xml:base attributes. The
input document from the source port is
replicated to the result port with xml:base attributes added to or corrected on each
element as specified by the options on this step.

For the document element: force the element to have an
xml:base attribute with the
document's [base URI] property's value as its value.

For other elements:

If the all option has the value
“true”, force the element to have an
xml:base attribute with the
element's [base URI] value as its value.

If the element's [base URI] is different from the its parent's
[base URI], force the element to have an xml:base attribute with the the following
value: if the value of the relative option
is “true”, a string which, when resolved
against the parent's [base URI], will give the element's [base
URI], otherwise the element's [base URI].

This step takes single documents on each of two ports and
compares them using the fn:deep-equal (as
defined in [XPath
2.0 Functions and Operators]). It is a dynamic
error (err:XC0019) if the documents are not equal,
and the value of the fail-if-not-equal
option is “true”. If the documents are
equal, or if the value of the fail-if-not-equal option is “false”, a c:result document is produced with contents
“true” if the documents are equal,
otherwise “false”.

7.1.4 p:count

The p:count step counts the number of
documents in the source input sequence and
returns a single document on result
containing that number. The generated document contains a single
c:result
element whose contents is the string representation of the number
of documents in the sequence.

The value of the path option
must be an anyURI. It is interpreted
as an IRI reference. If relative, it is resolved to absolute form.
The base URI used for resolution is the base URI of p:option element, if
present, otherwise, that is in case the default of '.' is used, the
base URI of the p:directory-list
element. It is
a dynamic
error (err:XC0017) if the absolute path does not
identify a directory. It is a dynamic error (err:XC0012) if the
contents of the directory path are not available to the step due to
access restrictions in the environment in which the pipeline is
run.

Conformant processors
must support directory paths whose
scheme is file. It is implementation-defined what
other schemes are supported by p:directory-list, and what the interpretation of
'directory', 'file' and 'contents' is for those schemes. It is a dynamic
error (err:XC0018) if the directory path's scheme is
not supported.

If present, the value of the include-filter or exclude-filter option must be a regular expression as specified in
[XPath 2.0
Functions and Operators], section 7.61 “Regular Expression Syntax”.

If the include-filter pattern matches a
directory entry's name, the entry is included in the output. If the
exclude-filter pattern matches a directory
entry's name, the entry is excluded in the output. If both options
are provided, the include filter is processed first, then the
exclude filter.

The result document produced for the specified
directory path has a c:directory document element whose base URI
is the directory path and whose name
attribute is the last segment of the directory path (that is, the
directory's (local) name).

Its contents are determined as follows, based on the entries in
the directory identified by the directory path. For each entry in
the directory, if either no filter was
specified, or the (local) name of the entry matches the filter
pattern, a c:file, a c:directory, or a c:other element is
produced, as follows:

A c:directory is produced for each
subdirectory not determined to be special.

Any file or
directory determined to be special by the p:directory-list step may be output using a
c:other
element but the criteria for marking a file as special are
implementation-defined.

<c:othername =
string />

When a directory entry is a subdirectory, that directory's
entries are not output as part of that entry's c:directory. A
user must apply this step again to the subdirectory to list
subdirectory contents.

Each of the elements c:file, c:directory, and c:other has a
name attribute when it appears within the
top-level c:directory element, whose value is a
relative IRI reference, giving the (local) file or directory
name.

This step uses the document provided on its input as the content
of the error raised. Since it generates an error upon invocation,
there can be no normal output; instead, an instance of the c:errors element will
be produced on the error output port, as is always the case for
dynamic
errors. The error generated can be caught by a p:try just like any other
dynamic error.

The href, line and column, or offset, might also be present on the c:error to identify
the location of the p:error element in
the pipeline.

7.1.8 p:escape-markup

The p:escape-markup step applies XML
serialization to the children of the document element and replaces
those children with their serialization. The outcome is a single
element with text content that represents the "escaped" syntax of
the children as they were serialized.

Since the result of this step is represented by unicode
characters in the document produced by this step and not as encoded
characters serialized into a byte stream, the option normalization-form does not apply to this step and
has been omitted from the standard serialization options.

7.1.9 p:http-request

The p:http-request step provides for
interaction with resources identified by IRIs over HTTP or closely
related protocols. The input document provided on the source port specifies a request by a single c:request element.
This element specifies the method, resource, and other request
properties as well as possibly including an entity body (content)
for the request.

It is a
dynamic
error (err:XC0004) if the status-only attribute has the value “true” and the detailed
attribute does not have the value “true”.

The method attribute specifies the method
to be used against the IRI specified by the href attribute, e.g. GET or
POST. If the href
attribute is not absolute, it will be resolved against the base URI
of its owner element.

If the username attribute is specified,
the username, password,
auth-method, and send-authorization attributes are used to handle
authentication as per [RFC
2617]. If the initial response to the request is an
authentication challenge, the username and
password attribute values are used to
generate an Authorization header and the
request is sent again. If that authorization fails, the request is
not retried.

For the purposes of avoiding an authentication challenge, if the
send-authorization attribute has a value of
“true” and the authentication method
specified by the auth-method supports
generation of an Authorization header without
a challenge, then an Authorization header is
generated and sent on the first request. If the response contains
an authentication challenge, the request is retried with an
appropriate Authorization header.

Appropriate values for the auth-method
attribute are "Basic" or "Digest" but other values are allowed.
The interpretation of auth-method values on c:request other than “Basic” or “Digest” is
implementation-defined.
It is a
dynamic
error (err:XC0003) if the requested auth-method isn't supported or the authentication
challenge contains an authentication method that isn't supported.
All implementations are required to support "Basic" and "Digest"
authentication per [RFC
2617].

The c:header element specifies a header
name and value, either for inclusion in a request, or as received
in a response.

<c:headername = stringvalue =
string />

The request is formulated from the attribute values on the
c:request element and its c:header and c:multipart or
c:body
children, if present, and transmitted to the host (and port, if
present) specified by the href attribute. The
details of how the request entity body, if any, is constructed are
given in the next section.

When the request is formulated, the step and/or protocol
implementation may add headers as necessary to either complete the
request or as appropriate for the content specified (e.g. transfer
encodings). A user of this step is guaranteed that their requested
headers and content will be sent with the exception of any
conflicts with protocol-related headers. It is a dynamic error (err:XC0020) if the
value of a header specified via c:header (e.g. Content-Type) conflicts with the value for that
header that the step and/or protocol implementation must set.

7.1.9.2 Request Entity body
conversion

The c:multipart element specifies a
multi-part body, per [RFC
1521], either for inclusion in a request or as received
in a response.

In the context of a request, the media type of the c:multipartmust be a multipart media type (i.e.
have a main type of 'multipart'). If the content-type attribute is not specified, a value of
"multipart/mixed" will be assumed.

The boundary attribute is required and is
used to provide a multipart boundary marker. The implementation
must use this boundary marker and must prefix the value with the
string -- when formulating the multipart
message. It is
a dynamic
error (err:XC0002) if the value starts with the
string --.

In a multipart message, the first set of c:header elements
that are the children of c:request are the headers to the multipart
message. The headers inside the c:multipart element are associated with a
particular message part. Each multipart body is represented by a
c:body preceded
by some number of c:header elements. These preceding headers
associated with the body part in the multipart message.

The c:body
element holds the body or body part of the message. Each of the
attributes holds controls some aspect of the encoding the request
body or decoding the body element's content when the request is
formulated. These are specified as follows:

The content-type attribute specifies the
media type of the body or body part, that is, the value of its
Content-Type header. If the media type is not
an XML type nor is it text, the content must already be
base64-encoded.

The encoding attribute controls the
decoding of the element content for formulating the body. A value
of base64 indicates the element's content
is a base64 encoded string whose byte stream should be sent as the
message body. An implementation may support other encodings other
that base64 but these encodings and their
names are implementation defined. It is a dynamic error (err:XC0052) if the
encoding specified is not supported by the implementation.

The id attribute specifies the value of
the Content-ID header for the body or body
part.

The description attribute specifies the
value of the Content-Description header for
the body or body part.

If an entity body is to be sent as part of a request (e.g. a
POST), either a c:body element, specifying the request
entity body, or a c:multipart element, specifying multiple
entity body parts, may be used. When c:multipart is
used it may contain multiple c:body children. A c:body specifies the
construction of a body or body part as follows:

If the content-type attribute does not
specify an XML media type, or the encoding
attribute is 'base64', then it is a dynamic error (err:XC0028) if the
content of the c:body element does not consist entirely of
characters, and the entity body or body part will consist of
exactly those characters.

Otherwise (the content-type attribute
does specify an XML media type and the encoding attribute is not 'base64'), it is a dynamic
error (err:XC0022) if the content of the c:body element does not
consist of exactly one element, optionally preceded and/or followed
by any number of processing instructions, comments or whitespace
characters, and the entity body or body part will consist of the
serialization of a document node containing that content. The
serialization of that document is controlled by the serialization
options on the p:http-request step
itself.

For example, the following input to a p:http-request step will POST a small XML document:

7.1.9.3 Managing the response

The handling of the response to the request and the generation
of the step's result document is controlled by the status-only, override-content-type and detailed attributes on the c:request input.

The override-content-type attribute
controls interpretation of the response's Content-Type header. If this attribute is present, the
response will be treated as if it returned the Content-Type given by its value. This original
Content-Type header will however be reflected
unchanged as a c:header in the result document. It is a dynamic
error (err:XC0030) if the override-content-type value cannot be used (e.g.
text/plain to override image/png).

If the status-only attribute has the value
“true”, the result document will contain
only header information. The entity of the response will not be
processed to produce a c:body or c:multipart element.

The c:response element represents an HTTP
response. The response's status code is encoded in the status attribute and the headers and entity body are
processing into c:header and c:multipart or
c:body
content.

Unless the status-only attribute has a
value “true”, the entity body of the
response is converted into a c:body or c:multipart element via the rules given in
the next section.

Otherwise (the detailed attribute is not
specified or its value is “false”), the
response to the request is handled as follows:

If the media type (as determined by the override-content-type attribute or the Content-Type response header) is an XML media type, the
entity is decoded if necessary, then parsed as an XML document and
produced on the result output port as the
entire output of the step.

Otherwise, the entity body of the response is converted into a
c:body or
c:multipart element via the rules given in
the next section.

In either case the base URI of the output document is the
resolved value of the href attribute from the
input c:request.

7.1.9.4 Converting Response Entity
Bodies

The entity of a response may be multipart per [RFC 1521]. In those
situations, the result document will be a c:multipart
element that contains multiple c:body elements inside.

If the media type of the response is a text type with a
charset parameter that is a Unicode character
encoding, the content of the constructed c:body element is the
translation of the text into a Unicode character sequence

If the media type of the response is an XML media type, the
content of the constructed c:body element is the result of decoding the
body if necessary, then parsing it with an XML parser. If the
content is not well-formed, the step fails.

For all other media types, the response is encoded as base64
(unless it is encoded already) and then produced as text children
of the c:body
element.

In the case of a multipart response, the same rules apply when
constructing a c:body element for each body part
encountered.

Note

Given the above description, any content identified as
text/html will be base64-encoded in the
c:body element,
as HTML isn't always well-formed XML. A user can attempt to convert
such content into XML using the p:unescape-markup step.

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0023) if that pattern matches anything
other than element nodes. Multiple matches are allowed, in which
case multiple copies of the insertion
documents will occur. If no elements match, then the document is
unchanged.

The value of the position option
must be an NMTOKEN in the following
list:

”first-child” - the insertion is made
as the first child of the match;

”last-child” - the insertion is made as
the last child of the match;

”before” - the insertion is made as the
immediate preceding sibling of the match;

”after” - the insertion is made as the
immediate following sibling of the match.

It is a
dynamic
error (err:XC0025) if the match pattern matches the
document element and the value of the position option is ”before”
or ”after”.

As the inserted elements are part of the output of the step they
are not considered in determining matching elements. If an empty
sequence appears on the insertion port, the
result will be the same as the source.

7.1.12 p:label-elements

The p:label-elements step generates a
label for each matched element and stores that label in the
specified attribute.

The value of the label option is an
XPath expression used to generate the value of the attribute
label.

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0024) if that expression matches
anything other than element nodes.

The value of the replacemust be a boolean value and is used to indicate
whether existing attribute values are replaced.

This step operates by generating attribute labels for each
element matched. For every matched element, the expression is
evaluated with the context node set to the matched element. An
attribute is added to the matched element using the attribute name
is specified the attribute option and the
string value of result of evaluating the expression. If the
attribute already exists on the matched element, the value is
replaced with the string value of the expression evaluation result
only if the replace option has the value of
“true”.

An implementation must bind the variable “p:index” in the static context of each evaluation of
the XPath expression to the position of the element in the sequence
of matched elements. In other words, the first element (in document
order) matched gets the value “1”, the
second gets the value “2”, the third,
“3”, etc.

The result of the p:label-elements step is the input document
with the attribute labels associated with matched elements. All
other non-matching content remains the same.

7.1.13 p:load

The p:load step has no inputs but produces
as its result an XML resource specified by an IRI.

The value of the href option
must be an anyURI. It is interpreted
as an IRI reference.

The value of the dtd-validate option
must be a boolean.

Load attempts to read an XML document from the specified IRI
reference, which may be relative, in which case it will be resolved
relative to the base URI of its p:option
element. It is
a dynamic
error (err:XC0026) if the document does not exist or
is not well-formed. It is a dynamic error (err:XC0011) if the
step is not allowed to retrieve from the specified location.
Otherwise, the retrieved document is produced on the result port. The base URI of the result is the
(absolute) IRI used to retrieve it.

If the value of the dtd-validate option
is “true”, DTD validation is performed on
the retrieved document. It is a dynamic error (err:XC0027) if the
document is not valid or the step doesn't support DTD
validation.

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0008) if the pattern matches anything
other than element or attribute nodes.

The value of the base-uri option
must be an anyURI. It is interpreted
as an IRI reference.

For every element or attribute in the input document which
matches the specified pattern, its XPath string-value is resolved
against the specified base URI and the resulting absolute IRI is
used as the matched node's entire contents in the output.

The base URI used for resolution defaults to the matched
attribute's element or the matched element's base URI unless the
base-uri option is specified. When the
base-uri option is specified, the option
value is used as the base URI regardless of any contextual base URI
value in the document. This option value is resolved against the
base URI of the p:option element used to set the option.

If the IRI reference
specified by the base-uri option on
p:make-absolute-uris is not valid, or
if it is absent and the input document has no base URI, the results
are implementation-dependent.

7.1.15 p:namespace-rename

The p:namespace-rename step renames any
namespace declaration or use of a namespace in a document to a new
IRI value.

The value of the from option
must be an anyURI. It should be either empty or absolute, but will not
be resolved in any case.

The value of the to option must be an anyURI. It should be empty or absolute, but will not be
resolved in any case.

It is a
dynamic
error (err:XC0014) if the XML namespace (http://www.w3.org/XML/1998/namespace) or the XMLNS
namespace (http://www.w3.org/2000/xmlns/) is
the value of either the from option or the
to option.

If the value of the from option is the
same as the value of the to option, the
input is reproduced unchanged on the output. Otherwise, namespace
bindings, namespace attributes and element and attribute names are
changed as follows:

Namespace bindings: If the from option
is present and its value is not the empty string, then every
binding of a prefix (or the default namespace) in the input
document whose value is the same as the value of the from option is

replaced in the output with a binding to the value of the
to option, provided it is present and not
the empty string;

otherwise (the to option is not
specified or has an empty string as its value) absent from the
output.

If the from option is absent, or its
value is the empty string, then no bindings are changed or
removed.

Elements and attributes: If the from
option is present and its value is not the empty string, for every
element (and attribute, unless the value of the elements-only option is “true”) in the input whose namespace name is the same
as the value of the from option, in the
output its namespace name

replaced with the value of the to
option, provided it is present and not the empty string;

otherwise (the to option is not
specified or has an empty string as its value) changed to have no
value.

If the from option is absent, or its
value is the empty string, then for every element (and attribute,
unless the value of the elements-only
option is “true”) whose namespace name has
no value, in the output its namespace name is set to the value of
the to option.

Namespace attributes: If the from option
is present and its value is not the empty string, for every
namespace attribute in the input whose value is the same as the
value of the from option, in the output

the namespace attribute's value is replaced with the value of
the to option, provided it is present and
not the empty string;

otherwise (the to option is not
specified or has an empty string as its value) the namespace
attribute is absent.

Note

The elements-only option is primarily
intended to make it possible to avoid renaming attributes when the
from option specifies no namespace, since
many attributes are in no namespace.

Care should be taken when specifying no namespace with the
to option. Prefixed names in content, for
example QNames and XPath expressions, may end up with no
appropriate namespace binding.

The step takes each pair of documents, in order, one from the
source port and one from the alternate port, wraps them with a new element node
whose QName is the value specified in the wrapper option, and writes that element to the
result port as a document.

If the step reaches the end of one input sequence before the
other, then it simply wraps each of the remaining documents in the
longer sequence.

Note

In the common case, where the document element of a document in
the result sequence has two element children,
any comments, processing instructions, or white space text nodes
that occur between them may have come from either of the input
documents; this step does not attempt to distinguish which one.

7.1.17 p:parameters

The p:parameters step exposes a set of
parameters as a c:param-set document.

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0023) if that pattern matches anything
other than element nodes. Multiple matches are allowed, in which
case multiple copies of the replacement
document will occur.

Every element in the primary input matching the specified
pattern is replaced in the output is replaced by the document
element of the replacement document. Only
non-nested matches are replaced. That is, once an element is
replaced, its descendants cannot be matched.

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0023) if that pattern matches anything
other than element nodes.

Each attribute on the document element of the document that
appears on the attributes port is copied to
each element that matches the match
expression.

If an attribute with the same name as one of the attributes to
be copied already exists, the value specified on the attribute port's document is used. The result port of
this step produces a copy of the source
port's document with the matching elements' attributes
modified.

The matching elements are specified by the match pattern in the
match option. All matching elements are
processed. If no elements match, the step will not change any
elements.

This step must not copy namespace declarations. If the
attributes copied from the attributes use
namespaces, prefixes, or prefixes bound to different namespaces,
the document produced on the result output
port will require namespace fixup.

7.1.21 p:sink

The p:sink step accepts a sequence
of documents and discards them. It has no output.

The XPath expression in the test option
is applied to each document in the input sequence. If the effective
boolean value of the expression is true, the document is copied to
the matched port; otherwise it is copied to
the not-matched port.

The XPath context for the
test option changes over time. For each
document that appears on the source port, the
expression is evaluated with that document as the context document.
The context position is the position of that document within the
sequence and the context size is the total number of documents in
the sequence.

Note

In principle, this component cannot stream because it must
buffer all of the input sequence in order to find the context size.
In practice, if the test expression does not use the last() function, the implementation can stream and
ignore the context size.

7.1.23 p:store

The p:store step stores a
serialized version of its input to a URI. The URI is either
specified explicitly by the 'href' option or implicitly by the base
URI of the document. This step outputs a reference to the location
of the stored document.

The matched nodes are specified with the match pattern in the
match option. For each matching node, the
XPath expression provided by the replace
option is evaluated and the string value of the result is used in
the output. Nodes that do not match are copied without change.

If the expression given in the match
option matches an attribute, the string value of the
replace expression is used as the new value
of the attribute in the output.

If the expression matches any other kind of node, the entire
node (and not just its contents) is replaced by the string
value of the replace expression.

7.1.25 p:unescape-markup

The p:unescape-markup step takes
the string value of the document element and parses the content as
if it was a Unicode character stream containing serialized XML. The
output consists of the same document element with children that
result from the parse. This is the reverse of the p:escape-markup
step.

The value of the namespace option
must be an anyURI. It should be absolute, but will not be resolved.

When the string value is parsed, the original document element
is preserved so that the result will be well-formed XML even if the
content consists of multiple, sibling elements.

The namespace option specifies the
default namespace. If it is provided, it will be declared as the
default namespace on the document element.

The content-type option may be used to specify an alternate content type
for the string value. An implementation may use a different parser to produce XML content
depending on the specified content-type. For example, an
implementation might provide an HTML to XHTML parser (e.g.
[HTML Tidy] or
[TagSoup]) for
the content type 'text/html'.

All implementations must support
the content type application/xml, and must
use a standard XML parser for it. It is a dynamic error (err:XC0051) if the
content-type specified is not supported by the implementation.

The encoding option specifies how the
data is encoded. All implementations must support the base64
encoding (and the absence of an encoding option, which implies that
the content is plain Unicode text). It is a dynamic error (err:XC0052) if the
encoding specified is not supported by the implementation.

If an encoding is specified, a
charset may also be specified. The
octet-stream that results from decoding the text must be interpreted using the specified encoding
to produce a sequence of Unicode characters to parse. If the option
is not specified, the value “UTF-8” must be used.

It is a
dynamic
error (err:XC0010) if the charset specified is not supported by the
implementation or if charset is specified
when encoding is not.

For example, with the 'namespace' option set to the XHTML
namespace, the following input:

<description>
&lt;p>This is a chunk.&lt;/p>
&lt;p>This is a another chunk.&lt;/p>
</description>

would produce:

<description xmlns="http://www.w3.org/1999/xhtml">
<p>This is a chunk.</p>
<p>This is a another chunk.</p>
</description>

The value of the match option
must be an XSLTMatchPattern. It is a dynamic
error (err:XC0023) if that pattern matches anything
other than element nodes.

Every element in the source document that
matches the specified match pattern is
replaced by its children, effectively “unwrapping” the children
from their parent. Non-element nodes and unmatched elements are
passed through unchanged.

Note

The matching applies to the entire document, not just the
“top-most” matches. A pattern of the form h:div will replace allh:div elements, not just the top-most ones.

This step produces a single document; if the document element is
unwrapped, the result might not be well-formed XML.

7.1.27 p:wrap

The p:wrap step wraps matching
nodes in the source document with a new
parent element.

Every node that matches the specified match pattern is replaced with a new element node
whose QName is the value specified in the wrapper option. The content of that new element is a
copy of the original, matching node.

The match option must only match element, text, processing
instruction, and comment nodes. It is a dynamic error (err:XC0041) if the
match pattern matches any other kind of node.

The group-adjacent option can be used to
group adjacent matching nodes in a single wrapper element. The
specified XPath expression is evaluated for each matching node with
that node as the XPath context node. Whenever two or more adjacent
matching nodes have the same “group adjacent” value, they are
wrapped together in a single wrapper element.

Two matching nodes are considered adjacent if and only if they
are siblings and either there are no nodes between them or all
intervening, non-matching nodes are whitespace text, comment, or
processing instruction nodes.

7.1.28 p:wrap-sequence

The p:wrap-sequence step accepts a
sequence of documents and produces either a single document or a
new sequence of documents.

In its simplest form, p:wrap-sequence takes a sequence of documents
and produces a single, new document by placing each document in the
source sequence inside a new document element
as sequential siblings. The name of the document element is the
value specified in the wrapper option.

The group-adjacent option can be used to
group adjacent documents. The specified XPath expression is
evaluated for each document with that document as the XPath context
node. Whenever two or more sequentially adjacent documents have the
same “group adjacent” value, they are wrapped together in a single
wrapper element.

7.1.29 p:xinclude

The p:xinclude step applies
[XInclude]
processing to the source document.

If present, the value of the output-base-uri option must be an anyURI.

If the step specifies a version, then
that version of XSLT must be used to
process the transformation. It is a dynamic error (err:XC0038) if the
specified version of XSLT is not available. If the step does not
specify a version, the implementation may use any version it has
available and may use any means to determine what version to use,
including, but not limited to, examining the version of the
stylesheet.

The XSLT stylesheet provided on the stylesheet port is applied to the document on the
source port. Any parameters passed on the
parameters port are used to define top-level
stylesheet parameters. The primary result document of the
transformation appears on the result port.
All other result documents appear on the secondary port. If XSLT 1.0 is used, an empty sequence
of documents must appear on the
secondary port.

If a sequence of documents is provided on the source port, the first document is assumed to be the
primary input document. This sequence is also the default
collection. It
is a dynamic
error (err:XC0039) if a sequence of documents is
provided to an XSLT 1.0 step.

A dynamic error occurs if the XSLT processor signals a fatal
error. This includes the case where the transformation terminates
due to a xsl:message instruction with
a terminate attribute value of
“yes”. How XSLT message termination errors are reported to
the XProc processor is implementation-dependent.

The invocation of the transformation is controlled by the
initial-mode and template-name options that set the initial mode
and/or named template in the XSLT transformation where processing
begins. It is
a dynamic
error (err:XC0056) if the specified initial mode or
named template cannot be applied to the specified stylesheet.

The output-base-uri option sets the
context's output base URI per the XSLT 2.0 specification, otherwise
the base URI of the result document is the
base URI of the first document in the source
port's sequence. If the value of the output-base-uri option is not absolute, it will be
resolved using the base URI of its p:option element. An XSLT 1.0 step
should use the value of the output-base-uri as the base URI of its output, if the
option is specified.

7.2 Optional
Steps

The following steps are optional. If they are supported by a
processor, they must conform to the semantics outlined here, but a
conformant processor is not required to support all (or any) of
these steps.

7.2.1 p:exec

The p:exec step runs an external
command passing the input that arrives on its source port as standard input, reading result from standard output, and errors from standard error.

The values of the source-is-xml,
result-is-xml, errors-is-xml, and fix-slashes options must
be boolean.

The p:exec step executes the
command passed on command with the
arguments passed on args. The processor
does not interpolate the values of the command or args (for example,
expanding references to environment variables). It is a dynamic
error (err:XC0033) if the command cannot be run.

If cwd is specified, then the current
working directory is changed to the value of that option before
execution begins. It is a dynamic error (err:XC0034) if the
current working directory cannot be changed to the value of the
cwd option. If cwd is not specified, the
current working directory is implementation-defined.

If the command or cwd options contain any “/” or “\” characters, they
will be replaced with the platform-specific path separator
character. If the fix-slashes option is
“true”, this fixup will be applied to args
as well.

The document that arrives on the source
port will be passed to the command as its standard input. If
source-is-xml is true, the serialization
options are used to convert the input into serialized XML which is
passed to the command, otherwise the XPath string-value of the
document is passed.

The standard output of the command is read and returned on
result; the standard error output is read and
returned on errors. In order to assure that
the result will be an XML document, each of the results will be
wrapped in a c:result element.

If result-is-xml is true, the standard
output of the program is assumed to be XML and will be parsed as a
single document. If it is false, the output is assumed not
to be XML and will be returned as escaped text.

If wrap-result-lines is
true, a c:line
element will be wrapped around each line of output.

The same rules apply to the standard error output of the
program, with the errors-is-xml and
wrap-error-lines options, respectively.

If either of the results are XML, they must be parsed with namespaces enabled and
validation turned off, just like p:document.

The single args option is treated as a
series of whitespace-separated values. Values which contain spaces
may be quoted with either single (') or double (") quotes. A
literal quote character may be inserted by doubling it.

7.2.2 p:hash

The p:hash step generates a hash,
or digital “fingerprint”, for some value and injects it into the
source document.

The value of the algorithm option must
be a QName. If it does not have a prefix, then it must be one of
the following values: “md5”, “sha1”.

A hash is constructed from the string specified in the
value option using the specified
algorithm.

The value of the match option must be an
XSLTMatchPattern.

The hash of the specified value is computed using the algorithm
and parameters specified. It is a dynamic error (err:XC0036) if the
requested hash algorithm is not one that the processor understands
or if the value or parameters are not appropriate for that
algorithm.

Conformant processors
must support the “md5” and “sha1”
algorithms. It is implementation-defined what
other algorithms are supported.

The matched nodes are specified with the match pattern in the
match option. For each matching node, the
string value of the computed hash is used in the output. Nodes that
do not match are copied without change.

If the expression given in the match
option matches an attribute, the hash is used as the new
value of the attribute in the output.

If the expression matches any other kind of node, the entire
node (and not just its contents) is replaced by the
hash.

7.2.3 p:uuid

The p:uuid step generates a UUID
and injects it into the source document.

The value of the match option must be an
XSLTMatchPattern. The value of the version
option must be an integer.

If the version is specified, that
version of UUID must be computed. It is a dynamic error (err:XC0060) if the
processor does not support the specified version of the UUID algorithm. If the version is not
specified, the version of UUID computed is implementation-defined.

The matched nodes are specified with the match pattern in the
match option. For each matching node, the
generated UUID is used in the output. Nodes that do not match are
copied without change.

If the expression given in the match
option matches an attribute, the UUID is used as the new
value of the attribute in the output.

If the expression matches any other kind of node, the entire
node (and not just its contents) is replaced by the
UUID.

When XML Schema validation assessment is performed, the
processor is invoked in the mode specified by the mode option. It is a dynamic error (err:XC0055) if the
implementation does not support the specified mode.

The result of the assessment is a document
with the Post-Schema-Validation-Infoset (PSVI) ([W3C XML Schema: Part
1]) annotations, if the pipeline implementation supports
such annotations. If not, the input document is reproduced with any
defaulting of attributes and elements performed as specified by the
XML Schema recommendation.

Whether or not the pipeline
processor supports passing PSVI annotations between steps is
implementation-defined.

7.2.7 p:www-form-urldecode

The p:www-form-urldecode step
decodes a x-www-form-urlencoded string
into a set of parameters.

The value option is interpreted as a
string of parameter values encoded using the x-www-form-urlencoded algorithm. It turns each such
encoded name/value pair into a parameter. The entire set of
parameters is written (as a c:param-set) on the result output port.

If any parameter name occurs more than once in the encoded
string, the resulting parameter set will contain a c:param for each
instance. However, only one of these will actually be used if the
parameter set is passed to another step on its parameter input
port.

7.2.8 p:www-form-urlencode

The p:www-form-urlencode step
encodes a set of parameter values as a x-www-form-urlencoded string and injects it into the
source document.

The sequence of documents provided on the source port is treated as the default collection. Any
parameters passed on the parameters port are
used to define top-level variables. The query
port must receive a single document whose element is c:query.
As an XQuery is not necessarily well-formed XML, the text
descendants of this element are considered the query.

<c:query>string
</c:query>

The result of the XQuery is a sequence of documents constructed
from an [XPath
2.0] sequence of elements. Each element in the sequence
is assumed to be the document element of a separate document.
It is a
dynamic
error (err:XC0057) if the sequence that results from
an XQuery contains items other than elements.

The base URI of each of the output documents is the base URI of
the first document in the source port's
sequence.

7.2.10 p:xsl-formatter

The p:xsl-formatter step receives
an [XSL 1.1]
document and renders the content. The result of rendering is stored
to the URI provided via the href option. A
reference to that result is produced on the output port.

The value of the href option
must be an anyURI. It may be relative,
in which case it will be resolved against the base URI of its
p:option
element before use.

The content-type of the output is controlled by the content-type option. This option specifies a media
type as defined by [IANA Media Types]. The option may include media
type parameters as well (e.g. "application/someformat;
charset=UTF-8"). The use of
media type parameters on the content-type
option is implementation-defined.

If the content-type option is not specified, the output type
is implementation-defined. The
default should be PDF.

A formatter may take any
number of optional rendering parameters via the step's parameters;
such parameters are defined by the XSL implementation used and are
implementation-defined.

The output of this step is a document containing a single
c:result
element whose content is the absolute URI of the document stored by
the step.

7.3 Serialization Options

Several steps in this step library require serialization options
to control the serialization of XML. These options are used to
control serialization as in the [Serialization]
specification.

The following options may be present on steps that perform
serialization:

byte-order-mark - The value of this
option must be a boolean.

cdata-section-elements - The value of
this option must be a list of
QNames. They are
interpreted as elements name.

doctype-public - The value of this
option must be a string. The public
identifier of the doctype.

doctype-system - The value of this
option must be an anyURI. The system
identifier of the doctype. It need not be absolute, and is not
resolved.

encoding - A character set name.

escape-uri-attributes - The value of
this option must be a boolean.

include-content-type - The value of
this option must be a boolean.

indent - The value of this option
must be a boolean.

media-type - The value of this option
must be a string. It specifies the
media type (MIME content type).

method - The value of this option
must be a QName. It specifies the
serialization method.

normalization-form - The value of this
option must be an NMTOKEN, one of the
enumerated values NFC, NFD, NFKC, NFKD, fully-normalized,
none or an implementation-defined value.

omit-xml-declaration - The value of
this option must be a boolean.

standalone - The value of this option
must be an NMTOKEN, one of the
enumerated values true, false, or omit.

undeclare-prefixes - The value of this
option must be a boolean.

version - The value of this option
must be a string.

In order to be consistent with the rest of this specification,
boolean values for the serialization parameters use “true” and
“false” where the serialization specification uses “yes” and “no”.
No change in semantics is implied by this different spelling.

The method option controls the
serialization method used by this component with standard values of
'html', 'xml', 'xhtml', and 'text' but only the 'xml' value is
required to be supported. The interpretation of the remaining
options are as specified in [Serialization].

[Definition: An implementation-dependent feature is one where the
implementation has discretion in how it is performed.
Implementations are not required to document or explain how
implementation-dependent
features are performed.]

[Definition: An implementation-defined feature is one where the
implementation has discretion in how it is performed. Conformant
implementations must document how
implementation-defined
features are performed.]

A.1 Implementation-defined
features

How pipeline outputs are connected to XML documents outside the
pipeline is implementation-defined. See Section 1, “Introduction”.

What additional step types, if any, are provided is
implementation-defined. See Section 2.1, “Steps”.

In Version 1.0 of XProc, how (or if) implementers provide local
resolution mechanisms and how (or if) they provide access to
intermediate results by URI is implementation-defined. See Section 2.2.1,
“External Documents”.

Except for cases which are specifically called out in , the
extent to which namespace fixup, and other checks for outputs which
cannot be serialized, are performed on intermediate outputs is
implementation-defined. See Section 2.4.1, “Namespace Fixup
on Outputs”.

When a declared step is evaluated directly by the XProc
processor (as opposed to occurring as an atomic step in some
container), how the input and output ports are bound to documents
is implementation-defined. See Section 5.8, “p:declare-step”.

The subpipeline may include declarations of additional steps
(e.g., other pipelines or other step types that are provided by a
particular implementation or in some implementation-defined way)
and import other pipelines. See Section 5.8.2, “Declaring
pipelines”.

Conformant processors must support directory paths whose scheme
is file. It is implementation-defined what other schemes are
supported by p:directory-list, and what the interpretation of
'directory', 'file' and 'contents' is for those schemes. See
Section 7.1.6, “p:directory-list”.

Any file or directory determined to be special by the
p:directory-list step may be output using a c:other element but the
criteria for marking a file as special are implementation-defined.
See Section 7.1.6, “p:directory-list”.

A formatter may take any number of optional rendering
parameters via the step's parameters; such parameters are defined
by the XSL implementation used and are implementation-defined. See
Section 7.2.10, “p:xsl-formatter”.

If the IRI reference specified by the base-uri option on
p:make-absolute-uris is not valid, or if it is absent and the input
document has no base URI, the results are implementation-dependent.
See Section 7.1.14,
“p:make-absolute-uris”.

How XSLT message termination errors are reported to the XProc
processor is implementation-dependent. See Section 7.1.30, “p:xslt”.

A.3 Infoset Conformance

This specification conforms to the XML Information Set [Infoset]. The
information corresponding to the following information items and
properties must be available to the processor for the documents
that flow through the pipeline.

The Document Information Item with
[base URI] and [children] properties.

C Glossary

In the context of XProc, a QName is
almost always a QName in the Namespaces in XML sense. Note,
however, that p:option and p:with-param
values can get their namespace declarations in a non-standard way
(with p:namespaces) and QNames that have no prefix
are always in no-namespace, irrespective of the default
namespace.

The steps that occur directly within, or within non-step
wrappers directly within, a step are called that step's contained steps. In other words, “container” and
“contained steps” are inverse relationships.

An element from the XProc namespace may have any attribute not from the XProc
namespace, provided that the expanded-QName of the attribute has a
non-null namespace URI. Such an attribute is called an extension attribute.

An implementation-dependent feature
is one where the implementation has discretion in how it is
performed. Implementations are not required to document or explain
how implementation-dependent
features are performed.

A step matches its signature if and
only if it specifies an input for each declared input, it specifies
no inputs that are not declared, it specifies an option for each
option that is declared to be required, and it specifies no options
that are not declared.

To produce a serializable XML document, the XProc processor must sometimes
add additional namespace nodes, perhaps even renaming prefixes, to
satisfy the constraints of Namespaces in XML. This process is
referred to as namespace fixup.

If a step has a document input port which is explicitly marked
“primary='true'”, or if it has exactly one
document input port and that port is not explicitly marked
“primary='false'”, then that input port is
the primary input port of the step.

If a step has a document output port which is explicitly marked
“primary='true'”, or if it has exactly one
document output port and that port is not explicitly
marked “primary='false'”, then that output
port is the primary output port of the
step.

If a step has a parameter input port which is explicitly marked
“primary='true'”, or if it has exactly one
parameter input port and that port is not explicitly
marked “primary='false'”, then that parameter
input port is the primary parameter input
port of the step.

E Guidance on Namespace Fixup
(Non-Normative)

An XProc processor may find it necessary to add missing
namespace declarations to ensure that a document can be serialized.
While this process is implementation defined, the purpose of this
appendix is to provide guidance as to what an implementation might
do to either prevent such situations or fix them as before
serialization.

When a namespace binding is generated, the prefix associated
with the QName of the element or attribute in question should be
used. From an infoset perspective, this is accomplished by setting
the [prefix] on the element or attribute.
Then when an implementation needs to add a namespace binding, it
can reuse that prefix if possible. If reusing the prefix is not
possible, the implementation must generate a new prefix that is
unique to the in-scope namespace of the element or owner element of
the attribute.

An implementation can avoid namespace fixup by making sure that
the standard step library does not output documents that require
fixup. The following list contains suggestions as to how to
accomplish this within the steps:

Any step that outputs an element in the step vocabulary
namespace http://www.w3.org/ns/xproc-step must
ensure that namespace is declared. An implementation should
generate a namespace binding using the prefix “c”.

When attributes are added by p:add-attribute or p:set-attributes, the step must ensure the
namespace of the attributes added are declared. If the prefix used
by the QName is not in the in-scope namespaces of the element on
which the attribute was added, the step must add a namespace
declaration of the prefix to the in-scope namespaces. If the prefix
is amongst the in-scope namespace and is not bound to the same
namespace name, a new prefix and namespace binding must be added.
When a new prefix is generated, the prefix associated with the
attribute should be changed to reflect that generated prefix
value.

When an element is renamed by p:rename,
the step must ensure the namespace of the element is declared. If
the prefix used by the QName is not in the in-scope namespaces of
the element being renamed, the step must add a namespace
declaration of the prefix to the in-scope namespaces. If the prefix
is amongst the in-scope namespace and is not bound to the same
namespace name, a new prefix and namespace binding must be added.
When a new prefix is generated, the prefix associated with the
element should be changed to reflect that generated prefix
value.

If the element does not have a namespace name and there is a
default namespace, the default namespace must be undeclared. For
each of the child elements, the original default namespace
declaration must be preserved by adding a default namespace
declaration unless the child element has a different default
namespace.

When an attribute is renamed by p:rename, the step must ensure the namespace of the
renamed attribute is declared. If the prefix used by the QName is
not in the in-scope namespaces of the element on which the
attribute was added, the step must add a namespace declaration of
the prefix to the in-scope namespaces. If the prefix is amongst the
in-scope namespace and is not bound to the same namespace name, a
new prefix and namespace binding must be added. When a new prefix
is generated, the prefix associated with the attribute should be
changed to reflect that generated prefix value.

When an element wraps content via p:wrap,
there may be in-scope namespaces coming from ancestor elements of
the new wrapper element. The step must ensure the namespace of the
element is declared properly. By default, the wrapper element will
inherit the in-scope namespaces of the parent element if one
exists. As such, there may be a existing namespace declaration or
default namespace.

If the prefix used by the QName is not in the in-scope
namespaces of the wrapper element, the step must add a namespace
declaration of the prefix to the in-scope namespaces. If the prefix
is amongst the in-scope namespace and is not bound to the same
namespace name, a new prefix and namespace binding must be added.
When a new prefix is generated, the prefix associated with the
wrapper element should be changed to reflect that generated prefix
value.

If the element does not have a namespace name and there is a
default namespace, the default namespace must be undeclared. For
each of the child elements, the original default namespace
declaration must be preserved by adding a default namespace
declaration unless the child element has a different default
namespace.

When the wrapper element is added for p:wrap-sequence or p:pack, the prefix used by the QName must be added to
the in-scope namespaces.

When a element is removed via p:unwrap,
an in-scope namespaces that are declared on the element must be
copied to any child element except when the child element declares
the same prefix or declares a new default namespace.

In the output from p:xslt, if an element
was generated from the xsl:element or an attribute from
xsl:attribute, the step must guarantee that an namespace
declaration exists for the namespace name used. Depending on the
XSLT implementation, the namespace declaration for the namespace
name of the element or attribute may not be declared. It may also
be the case that the original prefix is available. If the original
prefix is available, the step should attempt to re-use that prefix.
Otherwise, it must generate a prefix for a namespace binding and
change the prefix associated the element or attribute.

F Handling Circular and Re-entrant
Library Imports (Non-Normative)

When handling imports, an implementation should be able to
detect the following situations:

Circular imports: A imports B, B imports A.

Re-entrant imports: A imports B and C, B imports D, C imports
D.

To accomplish this, an implementation can use the following
strategy:

For a pipeline or library, process all the p:import elements and
record the URI of each resource returned. If the same resource URI
is encountered more than once, do not load and process that
resource after the first time.

For each resource, determine the exported step types defined by
the resource that are defined within the resource and not by an
import by the following rules:

Associated with every resource URI a list of step types declared
within and a list of the "top level" imported resource URI values
(i.e. the base URI of the resolved resource for each p:import
element in the library or pipeline).

If N(U,P) contains duplicates, then the same step name is
declared in different resources (which is an error).

For any resource, the set of types is:

S: {uri} -> {set of step declarations}
S(U) := N(U,{})[0]

If S(U) contains any duplicates, then the same step name is
declared in different resources (which is an error).

An implemention can use the maps N and S as necessary to resolve
imports.

G Sequential
steps, parallelism, and side-effects

XProc imposes as few constraints on the order in which steps
must be evaluated as possible and almost no constraints on parallel
execution.

In the simple, and we believe overwhelmingly common case, inputs
flow into the pipeline, through the pipeline from one step to the
next, and results are produced at the end. The order of the steps
is constrained by the input/output connections between them.
Implementations are free to execute them in a purely sequential
fashion or in parallel, as they see fit. The results are the same
in either case.

This is not true for pipelines which rely on side effects, such
as the state of the filesystem or the state of the web. Consider
the following pipeline:

There's no guarantee that “style” step will execute after the
“save-xslt” step. In this case, the solution is straightforward.
Even if you need the saved stylesheet, you don't need to rely on it
in your pipeline: