XPath 2.0 Expression Syntax

This document is an informal guide to the syntax of XPath 2.0 expressions, which are used in Saxon both within
XSLT stylesheets, and in the Java API. For formal specifications, see the
XPath 2.0 specification,
except where differences are noted here.

This summary has been updated to include brief descriptions of those XPath 2.0 constructs
that are implemented in the Saxon product.

Saxon expressions may be used either in an XSL stylesheet, or as a parameter to various Java
interfaces. The syntax is the same in both cases. In the Java interface, expressions are encapsulated
by the net.sf.saxon.expr.Expression class, and are parsed using a call such as
Expression.make("$a + $b"). To exploit the full power of XPath expressions in the Java API, you will
need to supply some support classes to perform functions such as resolving namespace references:
this cannot be done automatically because there is no stylesheet to use as a reference point.

An important change in XPath 2.0 is that all values are now considered as sequences. A sequence
consists of zero or more items; an item may be a node or a simple-value. Examples of simple-values
are integers, strings, booleans, and dates (not supported yet in Saxon). A single value such as
a number is considered as a sequence of length 1. The empty sequence is written as ();
a singleton sequence may be written as "a" or ("a"), and a general
sequence is written as ("a", "b", "c").

The node-sets of XPath 1.0 are replaced in XPath 2.0 by sequences of nodes. Path expressions
will return node sequences whose nodes are in document order with no duplicates, but other kinds
of expression may return sequences of nodes in any order, with duplicates permitted.

String literals are written as "London" or 'Paris'. In each case you can use the opposite
kind of quotation mark within the string: 'He said "Boo"', or "That's rubbish". In a stylesheet
XSL expressions always appear within XML attributes, so it is usual to use one kind of delimiter for
the attribute and the other kind for the literal. Anything else can be written using XML character
entities. From Saxon 7.1, string delimiters can be doubled within the string to represent the
delimiter itself: for example <xsl:value-of select='"He said, ""Go!"""'/>

Numeric constants follow the Java rules for decimal literals: for example, 12 or 3.05; a
negative number can be written as (say) -93.7, though technically the minus sign is not part of the
literal. (Also, note that you may need a space before the minus sign to avoid it being treated as
a hyphen within a preceding name). The numeric literal is taken as a double precision floating
point number if it uses scientific notation (e.g.) 1.0e7), as fixed point decimal
if it includes a full stop, or as a integer otherwise. Decimal values in Saxon have unlimited
precision, integers are limited to 64 bits.

There are no boolean constants as such: instead use the function calls true() and false().

Constants of other data types can be written using constructors, which look like function calls
but require a string literal as their argument. For example, float("10.7") produces
a single-precision floating point number. Saxon implements constructors for many of the data types
defined in XML Schema Part 2, but most of them are essentially dummy implementations at present.

Two exceptions are for date and dateTime values: for example
you can write constants for these data types as date("2002-04-30")
or dateTime("1966-07-31T15:00:00Z").

The value of a variable (local or global variable, local or global parameter) may be referred to
using the construct $name, where name is the variable name.

The variable is always evaluated at the textual place where the expression containing it appears;
for example a variable used within an xsl:attribute-set must be in scope at the point where the
attribute-set is defined, not the point where it is used.

A variable may take a value of any data type, and in general it is not possible to
determine its data type statically.

It is an error to refer to a variable that has not been declared.

Starting with XPath 2.0, variables (known as range variables) may be declared within
an XPath expression, not only using xsl:variable elements in the stylesheet. The
expressions that declare variables are the for, some, and every
expressions.

There are some constructs that are specifically string expressions, but in addition any other
kind of expression can be used in a context where a string expression is required:

According to the XPath 2.0 specification,
numeric expression is converted to a string using the canonical lexical representation
for the particular numeric type as defined in XML Schema. For example, the integer 2 is displayed
as "2", but the decimal 2.0 is displayed as "2.0". In Saxon, however, for the time being
the string representation never ends with ".0", to reduce backwards compatibility problems.

A boolean expression is displayed as one of the strings "true" or "false".

When a sequence expression is used in a string context, only the first item of the sequence
is used: the value of this item is converted to a string. The string-value of a text node
is the character content of the node; the string-value of an element node or document (root) node
is the concatenation of all its descendant text nodes.

The specific string expressions are as follows:

Construct

Meaning

string(expression)

This performs an explicit type conversion
to a string, which will always give the same result as the implicit conversion described above.
The main case where explicit conversion is useful is when assigning a value to a variable.

concat(expression1, expression2 {,expression3}*)

This concatenates the
string values of the arguments. There may be any number of arguments (two or more).

substring(expression1, expression2 [,expression3])

This extracts a substring of the string value of expression1. Expression2 gives
the start position (starting at 1), expression 3 gives the length: if omitted, the rest of the
string is used. For example, substring("Michael", 2, 4) is "icha".

substring-before(expression1 ,expression2)

This returns the substring of expression1 that precedes the first occurrence of
expression2. If expression1 does not contain expression2, it returns the empty string. For
example, substring-before("c:\dir", ":\") returns "c".

substring-after(expression1 ,expression2)

This returns the substring of expression1 that follows the first occurrence of
expression2. If expression1 does not contain expression2, it returns the empty string. For
example, substring-before("c:\dir", ":\") returns "dir".

normalize-space(expression1)

This removes leading and trailing white space, and converts all other sequences
of white space to a single space character. For example, 'normalize(" Mike Kay ")' returns
"Mike Kay"

translate(expression1, expression2, expression3)

This replaces any character in expression1 that also occurs in expression2 with
the corresponding character from expression3. For example, translate ("ABBA", "ABC", "123")
returns "1221". If there is no corresponding character in expression3 (because it is shorter than
expression2), the character is removed from the string.

upper-case(string)

Converts a string to upper case, using the rules for the default Java locale.

lower-case(string)

Converts a string to lower case, using the rules for the default Java locale.

Returns the name of the given node, or the current
node if the argument is omitted. The name here is the "display name"; it will use the same
namespace prefix as in the original source document.

local-name(node)

Returns the local part (after the colon) of the name of the
given node, or the current node if the argument is omitted

namespace-uri(node)

Returns the URI of the namespace of the name of the
given node, or the current node if the argument is omitted

unparsed-entity-uri(string-expression)

Returns the URI of the unparsed entity with the given name in the
current document, if there is one; otherwise the empty string

generate-id(node)

Returns a system-generated identifier for the
given node, or the current node if the argument is omitted.
The generated identifiers are always alphanumeric (except
for the document node, where the identifier is the empty string).

There are some constructs that are specifically numeric expressions, but in addition any string
whose value is convertible to a number can be used as a number. A string that does not represent
any number is treated as the double value NaN (not a number).

A boolean is converted to a number by treating false as 0 and true as 1.

This performs an explicit type conversion
to a number, which will always give the same result as the implicit conversion described above.
Explicit conversion can be useful when assigning a value to a variable. It is also useful when
creating an qualifier in a nodeset expression, since the meaning of a numeric qualifier is different
from a boolean one. At present in Saxon this always converts to a double.

count(sequence)

This returns the number of items in the sequence.

sum(sequence)

This converts the value of each item in the sequence to a number, and totals
the result.

avg(sequence)

This converts the value of each item in the sequence to a number, and returns the average of
the result.

min(sequence)

This converts the value of each item in the sequence to a number, and returns the minimum
the result.

max(sequence)

This converts the value of each item in the sequence to a number, and returns the maximum
the result.

string-length(string)

This returns the number of characters in the string value of expression.
Characters are counted using the Java length() function, which does not necessarily give the
same answer as the XPath rules, particularly when combining characters are used.

numberopnumeric-expression2

This performs an arithmetic operation on the two values. The operators
are + (plus), - (minus), * (multiply), div (divide), and mod (modulo). Note
that div currently does a floating-point division, and mod also uses floating point.
The other operators are evaluated according to the data type of their operands, for example
adding two integers gives an integer, adding an integer to a decimal gives a decimal.

- number

changes the sign of the number.

floor(number)

This returns the largest integer that is <= the argument

ceiling(number)

This returns the smallest integer that is >= the argument

round(number)

This returns the closest integer to the argument. The rounding rules follow
Java conventions which are not quite the same as the XSL rules.

position()

This returns the position of the current item in the list of items being
processed. Positions
are numbered from one.

last()

This returns the number of items in the list of items being
processed.

String values: the zero-length string is treated as false, everything else as true.
Note this changes in XPath 2.0 so that "0" and "false" are also treated as false.

Sequences: the empty sequence is treated as false. A sequence consisting of
a single boolean is treated as the value of that boolean.
A sequence containing at least one node is treated as true. Converting any other
sequence throws an error.

The specific boolean expressions are as follows:

Construct

Meaning

boolean(expression)

This performs an explicit type conversion
to a boolean, which will always give the same result as the implicit conversion described above.
The main case where explicit conversion is useful is when assigning a value to a variable.

false(), true()

These function calls return false and true respectively.

A and B, A or B

These operators perform boolean conjunction and disjunction. Saxon currently
implements them with three-valued logic as in SQL, though this is not in sync with the
XPath Working Draft as published.

not(boolean)

This returns the logical negation of the argument.

not3(boolean)

This returns the logical negation of the argument, but adapted to use three-valued
logic as in SQL. The difference is that not(()) is true, whereas not3(()) is (). Note
that the empty sequence, (), plays the same rule as the null value in SQL.

expression1 ( "=" | "!=" ) expression2

This tests whether the two values are equal (or not-equal). More strictly,
it considers both operands as sequences, and returns true if there is a value in the first
sequence that is equal (not-equal) to some value in the second sequence.

expression1 ( "is" | "isnot" ) expression2

This tests whether the two nodes are identical (or not-identical). The operands
must be single nodes. If either operand is an empty sequence, the result is an empty sequence;
if either operand is a sequence of two or more items, an error is thrown.

numeric-expression1opnumeric-expression2

This performs a numeric comparison of the two values. If both expressions
are sequences, the result is true if there is a pair of values from the two sequences that
satisfies the comparison. If one expression is a sequence, the result is true if there is
a value in that nodeset that satisfies the comparison with the other operand. The operators
are < (less-than), <= (less-or-equal), > (greater-than), >= (greater-or-equal).
The operators, when used in an XSL stylesheet, will need to be written using XML entities
such as "&lt;". From Saxon 7.1, comparison of nodes or strings performs a lexicographic
comparison using the default collating sequence; it no longer attempts to convert the values to
numbers unless one of the values is already numeric. This means that an expression such as
@price > @discount needs to be rewritten as number(@price) > number(@discount)

if ( condition ) then expr1 else expr

This returns the value of expr1 or expr2 depending on whether the condition
is true or false.

some $var in sequence satisfies condition

This returns true if there is an item in the sequence for which the condition
is true. For example, some $i in //empl satisfies exists($i/@desc) returns
true if some <empl> element in the source document has an @desc attribute.
Note that the item being tested in the condition does NOT become the context item: you could
nto replace the $i/@desc in this example with @desc or ./@desc

every $var in sequence satisfies condition

This returns true if there is no item in the sequence for which the condition
is false. For example, every $i in //empl satisfies exists($i/@desc) returns
true if no <empl> element in the source document has no @desc attribute.
Note that the item being tested in the condition does NOT become the context item: you could
nto replace the $i/@desc in this example with @desc or ./@desc

exists(sequence)

This returns true if the sequence contains at least one item.

empty(sequence)

This returns true if the sequence is empty.

lang(string-expression)

This returns true if the xml:lang attribute on (or inherited by) the current node
is equal to the argument, or if it contains a suffix starting with "-" and ending with the argument,
ignoring case.

expression instance of type

This returns true if the value of the expression is an instance of the given type.
For example (3,4,5) instance of xsd:integer* returns true. The type must be a built-in simple type
defined in XML Schema (and any namespace prefix must be declared), or it may be one of the keywords item,
node, element (etc). It is not at present possible to refer to named complex
types defined in an XML Schema.

This forms the union of the two sequences. In Saxon the two operands must currently
be sequences of nodes. The result is in document order with duplicates eliminated. The operator
"union" is a synonym of "|".

expression1 intersect expression2

This forms the intersection of the two sequences. These
must currently consist entirely of nodes. The result is in document order, with
no duplicates.

expression1 except expression2

This forms the intersection of the two sequences. These
must currently consist entirely of nodes. The result is in document order, with
no duplicates. For example, @* except @name gives you all
the attributes of the context element, except for the @name attribute.

sequence [ predicate ]

This returns the set of all items in the supplied sequence that satisfy the
predicate. The predicate may be a boolean expression (which is evaluated with the particular
item as context item); or it may be a numeric expression,
which is a shorthand for the boolean expression position()=predicate. The sequence
may of course itself have one or more predicates, so a chain of filters can be set up.

for $var in sequence return expression

This evaluates the given expression once for each item in the supplied
sequence, and returns a new sequence containing the concatenation of the results, in order.
For example, sum(for $i in ./orderline return $i/@qty * $i/@price) returns the total
value of an order by summing over the individual order lines.

nodeset-expression1 / nodeset-expression2

This follows the given path for each node in nodeset-expression1
(the "original nodes"), and returns
all the nodes reached (the "target nodes"). The two operands may be any expression
that returns a sequence of nodes. For example, at XPath 2.0 it is permitted to write
document('abc.xml')/key('k', 123), or book/(chapter|appendix)/section.
Most commonly, the second expression will be a step that navigates from the context node
to other nodes by following axes.
Examples include:

name - Select all the element children of the original nodes
with the given element name

prefix:* - Select all the element children of the original nodes
with the given namespace prefix

* - Select all the element children of the original nodes
regardless of element name

*:local-name - Select all the element children of the original nodes
with the given local-name, irrespective of namespace

@name - Select all the attributes of the original nodes
with the given attribute name

@prefix:* - Select all the attributes of the original nodes
with the given namespace prefix

@* - Select all the attributes of the original nodes
regardless of attribute name

text() - Select all the text node children of the original nodes

.. - Select the parents of the original nodes

node() - Select all the children of the original nodes

axis-name :: node-testoptional-predicates ) - a generalised construct
for navigating in any direction. The axis-name may be any of the following:

ancestor

Selects ancestor nodes starting with the current node and ending
with the document node

ancestor-or-self

Selects the current node plus all ancestor nodes

attribute

Selects all attributes of the current node (if it is an element)

child

Selects the children of the current node, in documetn order

descendant

Selects the children of the current node and their children, recursively
(in document order)

descendant-or-self

Selects the current node plus all descendant nodes

following

Selects the nodes that follow the current node in document order,
other than its descendants

following-sibling

Selects all subsequent child nodes of the same
parent node

parent

Selects the parent of the current node

preceding

Selects the nodes that precede the current node in document order,
other than its ancestors

preceding-sibling

Selects all preceding child nodes of the same
parent node

self

Selects the current node

The node-test may be:

a node name

"prefix:*" to select nodes with a given namespace prefix

"text()" (to select text nodes)

"node()" (to select any node)

"processing-instruction()" (to select any processing instruction)

"processing-instruction('literal')" to select processing instructions with the given name
(target)

comment()

to select comment nodes

The optional-predicates is a sequence
of zero-or-more predicates, each enclosed in square brackets, each being either a boolean
expression or a numeric expression (as a shorthand for testing position()).

nodeset-expression1 // relative-path

This is a shorthand for
nodeset-expression1/descendant-or-self::node()/relative-path
In effect "//" selects descendants, where "/" selects immediate children: but where predicates
are used, the expansion above defines the precise meaning.

.

This selects the context item (which at XPath 2.0 is not necessarily a node)

insert(sequence, integer, sequence)

This creates a new sequence by inserting the second supplied sequence at the
given position of the first supplied sequence.

remove(sequence, integer)

This creates a new sequence by removing the item at the
given position of the supplied sequence.

index-of(sequence, item)

This returns a sequence of integers giving the positions within the supplied
sequence where the given item occurs.

sublist(sequence, start, length)

This returns part of the given sequence, starting at the given start position,
and of the given length.

/

This selects the document root node. Note that this nodeset-expression cannot be
followed by the "/" or "//" operator or by a predicate.

/ relative-path

This is a shorthand for "root()/relative-path" where root() is an imaginary
designation of the document root node.

// relative-path

This is a shorthand for
"root()//relative-path" where root() is an imaginary
designation of the document root node.

document(expression1, expression2?)

The first string expression is a URL, or a nodeset containing a set of URLs;
the function returns the nodeset consisting of the root nodes of the documents referenced
(which must be XML documents). The optional second argument is node-set used to provide a base URL for resolving
relative URLs: the default is the URL of the document containing the relative URL, which
may be either a source document or a stylesheet document. Saxon allows the first argument
to contain a fragment identifier, e.g. "my.xml#xyz", or simply "#xyz", in which case
"xyz" must be the value of an ID attribute of an element within the referenced document.
The effect is to retrieve a tree rooted at this element.

id(expression)

This returns the node, if any, that has an ID attribute equal to the given
value,a nd which is in the same document as the current node.
To use ID attributes, there must be a DTD that defines the attribute as being of
type ID, and you must use a SAX parser that notifies ID attributes to the application.
If the argument is a nodeset, the function returns the set of nodes that have an id
attribute equal to a value held in any of the nodes in the nodeset-expression: each node
in the nodeset expression
is converted to a string and treated as a white-space-separated list of id values.
If the argument is of any other type, its value is converted to a string and
treated as a white-space-separated list of id values.

key(string-expression1, expression2)

The first string expression is a key name; the function returns the set
of nodes in the current document that have a key with this name, with the key value given
by the second expression. If this is a nodeset, the key values are the values of the nodes
in the nodeset; othewise, the key value is the string value of the argument.
Note that keys must be registered using the xsl:key element.

Some examples of NodeSet Expressions are listed below:

Expression

Meaning

XXX

Selects all immediate child elements with tag XXX

*

Selects all immediate child elements
(but not character data within the element)

../TITLE

Selects the TITLE children of the parent element

XXX[@AAA]

Selects all XXX child elements having
an attribute named AAA

*[last()]

Selects the last child of the current element

*/ZZZ

Selects all grandchild ZZZ elements

XXX[ZZZ]

Selects all child XXX elements that have a child ZZZ

XXX[@WIDTH and not(@WIDTH="20")]

Selects all child XXX elements that have a WIDTH attribute whose
value is not "20"

/*

Selects the outermost element of the document

//TITLE

Selects all TITLE elements anywhere in the document

ancestor::SECTION

Selects the innermost containing SECTION element

ancestor::SECTION/@TITLE

Selects the TITLE attribute of the innermost containing SECTION element

cast as performs a conversion of the supplied value to the specified type, for
example it can be used to convert a date to a string.

treat as is a way of asserting that the value is already of the required type;
if it is not, an error is reported. For example, treat as xsd:integer (my:func())
is a way of telling the system that my:func is expected to return an integer.

Saxon also implements the instance of operator which tests whether a value
is of a specified type.