Abstract

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a public W3C Working Draft for review by W3C Members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This version reflects changes to the XPath/XQuery language that were implemented the working drafts published in 23 July 2004 and 29 October 2004. It also implements changes to the XPath/XQuery language that are implemented in the 11 February 2005 draft, with the exception of the following major technical items:

This document does not reflect changes to the semantics of Effective Boolean Value (EBV).

This document does not reflect changes to the type terminology.

This document does not reflect changes to the semantics of function declarations with no namespace prefixes.

This document does not reflect changes to URI promotion.

This document does not reflect the latest changes to the copy-namespace semantics in constructors.

This version does NOT take into account any of the XPath/XQuery Formal Semantics last call comments (http://www.w3.org/2005/02/formal-semantics-issues.html), which will be processed in a future version of this document.

The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].

This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the expressions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.

Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.

[XPath/XQuery] is a functional language. [XPath/XQuery] is built from expressions, rather than statements. Every construct in the language (except for the XQuery query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another characteristic of a functional
language is that variables are always passed by value, and a variable's value cannot be modified through side effects.

[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas that describe the input documents and the output document, and the [XPath/XQuery] language can then perform operations based on these types. In addition, [XPath/XQuery] supports static type analysis. Static type analysis infers the output type of an expression based on the type of its input expressions. Static typing allows early detection of type errors, and can be used as the basis for certain forms
of optimization. The [XPath/XQuery] type system captures most of the features of [Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types.

1.1 Normative and Informative Sections

Certain aspects of language processing are described in this specification as implementation-defined or implementation-dependent.

[Definition: Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.]

[Definition: Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.]

A language aspect described in this specification as implementation-defined or implementation dependent may be further constrained by the specifications of a host language in which XPath is embedded.

A rigorous formal semantics clarifies the intended meaning of the English specification, ensures that no corner cases are left out, and provides a reference for implementation.

Why use formal notations? Rigor is achieved by the use of formal notations to represent [XPath/XQuery] objects such as expressions, XML values, and XML Schema types, and by the systematic definition of the relationships between those objects to reflect the meaning of the language. In particular, the dynamic semantics relates [XPath/XQuery] expressions to the XML value to which they evaluate, and the static semantics relates [XPath/XQuery] expressions to the XML Schema type that is inferred for
that expression.

The Formal Semantics uses several kinds of formal notations to define the relationships between [XPath/XQuery] expressions, XML values, and XML Schema types. This section introduces the notations for judgments, inference rules, and mapping rules as well as the notation for environments, which implement the dynamic and static contexts. The reader already familiar with these notations can skip this section and continue with [2.2 XML Values].

2.1.1 Notations from grammar productions

Grammar productions are used to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.

XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to their number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWOR expressions in XQuery.

[For/FLWR] Expressions

For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to their number in [XML Path Language (XPath) 2.0], and are annotated with "(XPath)". For instance, the
following production describes for expressions in XPath.

[For/FLWR] Expressions

XQuery Core grammar productions describe the XQuery Core. The Core grammar is given in [A Normalized core grammar]. Core productions are identified by a number, which corresponds to their number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of the "FLWOR" expression in the XQuery Core.

Core FLWOR Expressions

The Formal Semantics manipulates "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe values in the [Data Model], and to describe the [XPath/XQuery] type system. Formal Semantics productions are identified by a number, and are annotated by
"(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.

Type Definitions

Note that grammar productions that are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.

2.1.2 Notations for judgments

The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.

Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.

Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr represents any [XPath/XQuery] expression, and Value represents any value in the [XPath/XQuery] data model.

When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).

Patterns may appear with subscripts (e.g. Expr1, Expr2) to distinguish different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. The judgment'$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.

In a few cases, patterns may have a name that is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern that stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.

2.1.3 Notations for inference rules

Inference rules are used to specify whether a judgment holds or not. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments.

A logical inference rule is written as a collection of premises and a conclusion, written respectively above and below a dividing line:

premise1 ... premisen

conclusion

All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.

Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:

$x => 0 3 => 3

$x + 3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.

An inference rule may have no premises above the line, which means that the expression below the line always holds:

3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.

The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. Here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:

Variable => Integer

Variable + 0 => Integer

As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

Note

In effect, inference rules are just a notation that describes a bottom-up algorithm. In the examples above, the rules describe an evaluation algorithm where the result of an expression depends on the result for its sub-expressions.

2.1.4 Notations for environments

Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" expression can be captured in an environment and used for further evaluations.

An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can access information in an environment or update the environment.

If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped. The notation is intentionally similar to function application, because an environment can be considered a function from the argument symbol to the "object" to which the symbol is mapped.

This document uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".

The two main environment groups used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [3.1 Expression Context].

For example, dynEnv.varValue denotes the dynamic environment that maps variables to values and dynEnv.varValue(Variable) denotes the value of the variable Variable in the dynamic environment.

Updating is only defined on environment groups:

"env + mem(symbol => object) " denotes the new environment group that is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol => object indicates that symbol is mapped to object in the new environment.

The following shorthand is also allowed: "env + mem( symbol1 => object1 ; ... ; symboln => objectn ) " in which each symbol is mapped to a corresponding object in the new environment.

If the "object" is a type then the following notation relates a symbol to a type: "env + mem(symbol : object) ".

Updating the environment overrides any previous binding that might exist for the same name. Updating the environment captures the scope of a symbol (e.g., a variable, a namespace prefix, etc.) Also, note that there are no operations to remove entries from environments: this is never necessary because updating an the environment group effectively creates a new extended copy of the original environment group, and the original environment group remains accessible along with the updated copy.

Environments are used in a judgment to capture some of the context in which the judgment is computed, and most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is used in almost all inference rules.

2.1.5 Putting it together

This rule is read as follows: if two expressions Expr1 and Expr2 are known to have the static types types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 ,
Expr2" must have the static type "Type1, Type2", which is the sequence of types Type1 and Type2. The above inference rule does not modify the (static) environment.

The following rule defines the static semantics of a "let" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.

This rule is read as follows: First, because the variable is a QName, it is first expanded into an expanded QName. Second, the type Type1 for the "let" input expression Expr1 is computed. Then the "let" variable with expanded name, expanded-QName with type Type1 is added into the
varType component of the static environment group statEnv. Finally, the type Type2 of Expr2 is computed in that new environment.

Editorial note

Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.2 XML Values

[Data Model] specifies normatively the data model of [XPath/XQuery]. The [XPath/XQuery] language is formally defined by operations on this data model.

This section defines formal notations to denote values in [Data Model]. These notations are used to describe and manipulate values in inference rules, but are not exposed to the [XPath/XQuery] user.

2.2.1 Formal values

For reference, a summary of the data model is given below, followed by the formal notation for data model values. Although not specified in this document, all the normative constraints specified in [Data Model] apply to the formal notation for data model values.

A value is a sequence of zero or more items. An item is either an atomic value or a node.

An atomic value is a value in the value space of an atomic type and is labeled with the name of that atomic type. An XML Schema atomic type [Schema Part 2] may be primitive or derived, or xdt:untypedAtomic.

A node is either an element, an attribute, a document, a text, a comment, or a processing-instruction node. Elements have a type annotation and contain a value. Attributes have a type annotation and contain a simple value, which is a sequence of atomic values. Text nodes always contain one string value of type xdt:untypedAtomic, therefore the corresponding type annotation is omitted.

A type annotation can be either the QName of a declared type or an anonymous type. An anonymous type corresponds to an XML Schema type for which the schema writer did not provide a name. Anonymous type names are not visible to the user, but are generated during schema validation and used to annotate nodes in the data model. By convention, anonymous type names are written in the Formal Semantics as: [Anon0], [Anon1], etc.

Untyped elements (e.g., from well-formed documents) are annotated with xdt:untyped, untyped attributes are annotated with xdt:untypedAtomic, and untyped atomic values (i.e., text content or attribute content in well-formed documents) are annotated with xdt:untypedAtomic.

An element has an optional "nilled" marker. This marker can only be present if the element has been validated against an element type in the schema which is "nillable", and the element has no content and an attribute xsi:nil set to "true".

An element also has a sequence of namespace annotations, which are the set of active namespace declarations that are in-scope for the element. Each namespace annotation is a prefix, URI pair. Namespace annotations are not values, i.e., they are never the result of evaluating an expression, and they only occur as annotations on elements. In examples, we omit the namespace annotations when they are empty. For example, the following two values are equivalent:

In the above grammar, "String" indicates the value space of xs:string, "Decimal" indicates the value space of xs:decimal, etc.

Note that the same rule about constructing sequences apply to the values described by that grammar. Notably sequences cannot be nested. For example, the sequence (10, (1, 2), (), (3, 4)) is equivalent to the sequence (10, 1, 2, 3, 4).

Use of formal notations for types. The Formal Semantics uses formal notations for types instead of XML Schema syntax. These notations are used extensively to describe and manipulate types in the inference rules. The formal notations for types introduced here are not exposed to the [XPath/XQuery] user.

Representation of content models. For the purpose of static typing, the [XPath/XQuery] type system only describes minOccurs, maxOccurs combinations that correspond to the DTD operators +, *, and ?. Choices are represented using the DTD operator |. All groups are represented using the & notation.

Representation of anonymous types. To clarify the semantics, the [XPath/XQuery] type system makes all anonymous types explicit.

Representation of XML Schema simple type facets and identity constraints. For simplicity, XML Schema simple type facets and identity constraints are not formally represented in the [XPath/XQuery] type system. However, an [XPath/XQuery] implementation supporting XML Schema import and validation must must take simple type facets and identity constraints into account.

2.3.2 Item types

An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, a comment node type, or a processing instruction type. We distinguish between document nodes, attribute nodes, and nodes that can occur in element content (elements, comments, processing instructions, and text nodes), as we need to refer to element content nodes later in the formal semantics.

An element or attribute type has an optional name and an optional type reference. A name alone corresponds to a reference to a global element or attribute declaration. A name with a type reference corresponds to a local element or attribute declaration. The word "element" or "attribute" alone refers to the wildcard types for any element or any attribute. In addition, an element type has an optional nillable flag that indicates whether the element can be nilled or not.

A document type has an optional content type. If no content type is given, then it refers to the wildcard type describing any document.

Note

Generic node types (e.g., node()), are interpreted in the type system as union types (e.g., element | attribute | text | comment | processing-instruction) and therefore do not appear here. The semantics of sequence types is described in [3.5.4 SequenceType Matching].

Examples

The following is a text node type

text

The following is a type for all elements

element

The following is a type for all elements of type string

element of type xs:string

The following is a type for a nillable element of type string

element size nillable of type xs:string

The following is a reference to a global attribute declaration

attribute sizes

The following is a type for elements with anonymous type [Anon1]:

element sizes of type [Anon1]

2.3.3 Content models

Following XML Schema, types in [XPath/XQuery] are composed from item types by optional, one or more, zero or more, all group, sequence, choice, empty sequence, or empty choice (written none).

The type empty matches the empty sequence. The type none matches no values. It is called the empty choice because it is the identity for choice, that is (Type | none) = Type)). The type none is the static type for [7.2.6 The fn:error function].

Types

The [XPath/XQuery] type system includes three binary operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema. The [XPath/XQuery] type system includes three unary operators on types: "*", "+", and "?", corresponding respectively to zero or more instances of the type, one or more instances of the type, or an optional instance of the type.

The "&" operator builds the "interleaved product" of two types. The type Type1&Type2 matches any sequence that is an interleaving of a sequence that matches Type1 and a sequence that matches Type2. The
interleaved product represents all groups in XML Schema. All groups in XML Schema are restricted to apply only on global or local element declarations with lower bound 0 or 1, and upper bound 1.

Examples

A sequence of elements

The "," operator builds the "sequence" of two types. For example,

element title of type xs:string, element year of type xs:integer

is a sequence of an element title of type string followed by an element year of type integer.

The union of two element types

The "|" operator builds the "union" of two types. For example,

element editor of type xs:string | element bib:author

means either an element editor of type string, or a reference to the global element bib:author.

An all group of two elements

The "&" operator builds the "interleaved product" of two types. For example,

(element a & element b) =
element a, element b
| element b, element a

which specifies that the a and b elements can occur in any order.

An empty type

The following type matches the empty sequence.

empty

A sequence of zero or more elements

The following type matches zero or more elements each of which can be a surgeon or a plumber.

A type definition has a name (possibly anonymous) and a type derivation. In the case of a complex type, or a simple type derived by list or union, derivation indicates if the type is derived by extension or restriction from its base type, gives its content model, and an optional flag indicating if it has mixed content. In the case of an atomic type, it just indicates from which type it is derived. When the type derivation is omitted, the type derives by restriction from xs:anyType, as
in:

Empty content can be indicated with the explicit empty sequence, or omitted, as in:

define type Bib { } =
define type Bib { empty }

Global element and attribute declarations always have a name and a reference to a (possibly anonymous) type. A global element declaration also may declare a substitution group for the element and whether the element is nillable.

Note that the two anonymous types in the item element declarations are mapping to types with names [Anon1] and [Anon2].

The following additional definitions illustrate how more advanced XML Schema features (a complex type derived by extension, an anonymous simple type derived by restriction, and substitution groups) are represented in the [XPath/XQuery] type system.

2.4 Processing model and main judgments

This section defines a processing model for [XPath/XQuery], then defines formal judgments for each key phase in that processing model (normalization, static type analysis and dynamic evaluation).

2.4.1 Processing model

The [XPath/XQuery] processing model is defined in Section 2.2 Processing ModelXQ, which contains the following figure depicting the processing model.

Figure 1: Processing Model Overview

This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.

Query processing consists of two phases: a static analysis phase and a dynamic evaluation phase. Static analysis is further divided into four sub-phases. Each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we point to the relevant notations introduced later in the document.

[Definition: The static analysis phase depends on the expression itself and on the static context. The static analysis phase does not depend on input data (other than schemas).]

The purpose of the static analysis phase is to detect errors, e.g., syntax errors or type errors, at compile time rather than at run-time. If no error occurs, the result of static analysis could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. Static analysis consists of the following sub-phases:

Parsing. (Step SQ1 in Figure 1). The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal operation tree of the parsed query is created.

Static Context Processing. (Steps SQ2, SQ3, and SQ4 in Figure 1). The static semantics of [expression/query] depends on the static input context. The static input context needs to be generated before the [expression/query] can be analysed. In XQuery, static the input context may be defined by the processing environment and by declarations in the Query Prolog (See [5 Modules and Prologs]). In XPath, the static input context is defined
by the processing environment. The static input context is denoted by statEnv.

Normalization. (Step SQ5 in Figure 1). To simplify the semantics specification, some normalization is performed on the [expression/query]. The [XPath/XQuery] language provides many powerful features that make [expression/query]s simpler to write and use, but are also redundant. For instance, a complex for expression might be rewritten as a composition of several simple for expressions. The language composed of these simpler
[expression/query] is called the [XPath/XQuery] Core language and is described by a grammar which is a subset of the XQuery grammar. The grammar of the [XPath/XQuery] Core language is given in [A Normalized core grammar].

During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literal expressions and variables.

Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.4.2 Normalization judgment].

After normalization, the full semantics is obtained by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.

Static type analysis. (Step SQ6 in Figure 1). Static type analysis is optional. If this phase is not supported, then normalization is followed directly by dynamic evaluation.

Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and of input documents into account.

If the [expression/query] is not type-safe, static type analysis yields a type error. For instance, a comparison between an integer value and a string value might be detected as an type error during the static type analysis. If static type analysis succeeds, it yields an abstract syntax tree where each sub-expression is "annotated" with its static type.

More precisely, the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.4.3 Static typing judgment].

Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.

The dynamic evaluation phase (sometimes also called "execution") evaluates a query on input document(s).

Dynamic Context Processing. (Steps DQ2 and DQ3 in Figure 1).The dynamic semantics of [expression/query] depends on the dynamic input context. The dynamic input context needs to be generated before the [expression/query] can be evaluated. The dynamic input context may be defined by the processing environment and by statements in the Query Prolog (See [5 Modules and Prologs]). In XPath, the dynamic input context is defined by the
processing environment. The static input context is denoted by dynEnv.

Dynamic Evaluation. (Steps DQ4 and DQ5 in Figure 1). This phase computes the value of an [expression/query]. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals and variables. Evaluation may result in a value OR a dynamic error, which may be a non-type error or a type error. If static typing of an expression does not raise a type
error, then dynamic evaluation of the same expression will not raise a type error, although dynamic evaluation may raise some non-type error.

The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (dynEnv), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.4.4 Dynamic evaluation judgment].

Static type analysis catches only certain classes of errors. For instance, it can detect a comparison operation applied between incompatible types (e.g., xs:int and xs:date). Some other classes of errors cannot be detected by the static analysis and are only detected at evaluation time. For instance, whether an arithmetic expression on 32 bits integers (xs:int) yields an out-of-bound value can only be detected at run-time by looking at the data.

While implementations are free to implement different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report type errors during static analysis, as specified in this document. Dynamically typed implementations are required to find and report type errors during evaluation, but are permitted to report them during static analysis.

Notice that the separation of logical processing into phases is not meant to imply that implementations must separate the static analysis phase from the dynamic evaluation phase; processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model simply defines the final result.

The above processing phases are all internal to the [XPath/XQuery] processor. They do not deal with how the [XPath/XQuery] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:

Schema Import Processing. The [XPath/XQuery] type system is based on XML Schema. In order to perform dynamic or static typing, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [Missing Reference :
sec_importing_schema].

Data Model Generation. Expressions are evaluated on values in the [Data Model]. XML documents must be loaded into the [Data Model] before the evaluation phase. This is described in the [Data Model] and is not discussed further here.

Serialization. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is described in [Data Model Serialization] and is not discussed further here.

The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.

2.4.2 Normalization judgment

Normalization is specified using mapping rules, which describe how a [XPath/XQuery] expression is rewritten into an expression in the [XPath/XQuery] Core. Mapping rules are also used in [Missing Reference : sec_importing_schema] to specify how XML Schemas are imported into the [XPath/XQuery] type system.

Notation

Mapping rules are written using a square bracket notation, as follows:

[Object]Subscript

==

Mapped Object

The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.

Since normalization is always applied in the presence of a static context, the above rule is a shorthand for:

The static environment is used in certain normalization rules (e.g. for normalization of function calls). To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.

The normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is:

holds when, in the static environment statEnv, the expression Expr has type Type.

Example

The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.

For instance, the following expression.

let $v := 3 return $v+5

has type xs:integer. This can be inferred as follows: the input literals '3' and '5' have type integer, so the variable $v also has type integer. Since the sum of two integers is an integer, the complete expression has type integer.

Note

The type of an expression is computed by inference. Static type inference rules define for each kind of expression how to compute the type of the expression given the types of its sub-expressions. Here is a simple example:

This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a union: '(Type2|Type3)'.

The "left half" (the part before the :) of the expression below the line corresponds to some [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal abstract syntax tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr1, Expr2, and Expr3) that
need to be matched against the children of the node in the abstract syntax tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the condition expression and the two branches of the if-then-else expression. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This example illustrates a general feature
of the [XPath/XQuery] type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the abstract syntax of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type correct.

2.4.4 Dynamic evaluation judgment

The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.

The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.

Example

For instance, the following expression.

let $v := 3 return $v+5

yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.

Note

As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions.

The inference rules used for dynamic evaluation, like those for static type inference, follow a bottom-up recursive structure, computing the value of expressions from the values of their sub-expressions.

These prefixes are always italicized to emphasize that the corresponding functions, variables, and types are abstract: they are not and cannot be made accessible in the host language. None of these special prefixes are given a URI.

Many functions in the [Functions and Operators] document are generic: they perform operations on arbitrary components of the data model, e.g., any kind of node, or any sequence of items. For instance, the fn:distinct-nodes function removes duplicates in any sequence of nodes. As a result, the signature given in the [Functions and Operators] document is also generic. For instance, the signature of the
fn:distinct-nodes function is:

fn:distinct-nodes(node*) as node*

As defined, this signature provides little useful type information. For such functions, better type information can often be obtained by having the output type depend on the type of input parameters. For instance, if the function fn:distinct-nodes is applied on a parameter of type element a*, element b, one can easily deduce that the resulting sequence is a collection of either a or b elements.

3 Basics

The organization of this section parallels the organization of Section 2 BasicsXQ.

3.1 Expression Context

Introduction

The expression context for a given expression consists of all the information that can affect the result of the expression. This information is organized into the static context and the dynamic context. This section specifies the environments that represent the context information used by [XPath/XQuery] expressions.

3.1.1 Static Context

The environment group statEnv denotes the environments that are available during static analysis. Static analysis may extend parts of the static environment. The static environment is also available during dynamic evaluation.

If analysis of an expression relies on some component of the static context that has not been assigned a value, a static error is raised. This constraint is formalized in [3.3 Error Handling].

The namespace environment corresponds to statically known namespaces in the [XPath/XQuery] static context.

The namespace environment maps a namespace prefix (NCName) onto a namespace kind and a namespace URI (URI) or the empty namespace (#EMPTY-NAMESPACE). The namespace kind is either passive or active. The namespace kind determines whether a namespace node is created for an element during element construction.

The default element namespace corresponds to the default namespace for element and type names in the [XPath/XQuery] static context.

The default element namespace contains a namespace URI (a URI) or the empty namespace (#EMPTY-NAMESPACE) and is used for any unprefixed QName appearing in a position where an element or type name is expected.

The function declaration environment corresponds to the function signatures part of the [XPath/XQuery] static context.

The function declaration environment stores the static type signatures of functions. Because [XPath/XQuery] allows multiple functions with the same name differing only in the number and signature of the parameters, this environment maps an expanded QName to the set of all function declaration signatures of the form "define functionQName (Type1, ..., Typen) returnType" each corresponding to a declaration of the function.

The collations environment corresponds to the statically known collations in the [XPath/XQuery] static context.

The collations maps a unique namespace URI (a URI) to a pair of functions: the first function takes a set of strings and returns a sequence containing those strings in sorted order; and the second function takes two strings, returns true if they are considered equal, and false if not.

The default empty sequence order corresponds to the default order for empty sequences in the [XPath/XQuery] static context.

This component controls whether an empty sequence is interpreted as the greatest value or as the least value during processing of an order by clause in a FLWOR expression. Its value may be greatest or least.

The statEnv.copyNamespacesMode environment component corresponds to the copy-namespaces mode in the [XPath/XQuery] static context.

This component controls the namespace bindings that are assigned when an existing element node is copied by an element constructor. Its value consists of two parts: preserve or no-preserve, and inherit or no-inherit.

The doc types environment corresopnds to the statically known documents in the [XPath/XQuery] static context. It contains the static type for the input documents, and is used to provide the static type to the fn:doc function.

The collection types environment corresponds to the statically known collections in the [XPath/XQuery] static context. It contains the static type for the input collections, and is used to provide the static type to the fn:collection function.

This rule reads as follows: "the phrase on the bottom (a namespace declaration in the query prolog followed by a sequence of expressions) is well-typed (accepted by the static type inference rules) within an environment statEnvif the sequence of expressions above the line is well-typed in the environment obtained from statEnv by adding the namespace declaration".

This is a common idiom for adding new information to an environment and passing that environment for use in sub-expressions. If the environment must be updated with a completely new component, the following notation is used:

3.1.1.1 Resolving QNames to Expanded QNames

A common use of the static environment is to expand a QName by looking up the URI that corresponds to the QName's namespace prefix in the statEnv.namespace environment and by constructing an expanded-QNameDM, which contains the URI and the QName's local part. Element and type QNames may be in the empty namespace, that is,
there is no URI associated with their namespace prefix. The empty namespace is denoted by the special value #EMPTY-NAMESPACE.

The auxiliary judgments below expand an element, type, attribute, variable, or function QName by looking up the namespace prefix in statEnv.namespace or, if the QName is unqualified, by using the appropriate default namespace.

The dynamic function environment corresponds to the function implementations (or definition) part of the in the [XPath/XQuery] dynamic context.

The dynamic function environment maps an expanded function name and parameter signature of the form "expanded-QName (Type1, ..., Typen)" to the remainder of the corresponding function definition, which is either the value #BUILT-IN for functions defined in [Functions and
Operators]; the value #EXTERNAL for externally defined functions; the value #IMPORTED(URI), if the function is defined in the imported module with namespace URI; or, if the function is locally declared, the function's body and a list of variables, which are the function's formal parameters, of the form "(Expr, Variable1,..., Variablen)".

The dynamic value environment corresponds to the variable values in the [XPath/XQuery] evaluation context.

The dynamic value environment maps an expanded variable name (expanded Variable) to the variable's value (Value) or to the value #IMPORTED(URI), if the variable is defined in the imported module with namespace URI.

The doc values environment corresopnds to the available documents in the [XPath/XQuery] dynamic context. It contains the document nodes corresponding to input documents, and is used to provide the dynamic value of the fn:doc function.

The collection value environment corresponds to the available collections in the [XPath/XQuery] dynamic context. It contains the root nodes corresponding to the input collections, and is used to provide the dynamic value of the fn:collection function.

3.3.1 Kinds of Errors

Notation

The symbol Error denotes any error. We distinguish between a static errorXQ (denoted by statError), a type errorXQ (denoted by typeError), and a generic dynamic errorXQ (denoted by dynError), which represents all dynamic errors. A static error is raised during static analysis. A type error may be raised during static analysis or dynamic evaluation. A dynamic error is raised during dynamic evaluation. Non-type static errors are not formalized in this document.

3.3.2 Identifying and Reporting Errors

3.3.3 Handling Dynamic Errors

In general, when an error is raised during evaluation of some expression Expr, the error is propogated to the expression Expr1 in which Expr is evaluated. The expression Expr1, in turn, propogates the error to the expression in which Expr1 is evaluated, and so on, until the error is returned to the query environment.

Since most expressions propogate errors as described, we use one inference rule to specify this default behavior. The rule below states that if any sub-expression Expri of expression Expr raises an error dynError then Expr also raises dynError.

If analysis (evaluation) of an expression relies on some component of the static (dynamic) context that has not been assigned a value, a static (dynamic) error is raised. The following two rules handle all those cases when a component of an environment is accessed but not defined.

3.3.4 Errors and Optimization

In [XPath/XQuery], the detection and reporting of dynamic errors is implementation dependent. This permits different implementations to choose to evaluate or optimize an expression in different ways. When an implementation is able to evaluate an expression without evaluating some subexpression, the implementation is never required to evaluate that subexpression solely to determine whether it raises a dynamic error. For example, if a function parameter is never used in the body of the function, an
implementation may choose whether to evaluate the expression bound to that parameter in a function call. Similarly, if the variable bound by a let expression is never used in the corresponding return expression, the implementation is not required to evaluate the expression to which the variable is bound.

For simplicity, the dynamic inference rules in Formal Semantics define an eager evaluation semantics for all expressions, i.e., all sub-expressions are evaluated regardless of whether their values are necessary to evaluate the containing expression. For example, every function parameter is evaluated before the body of the function is evaluated, and the expression bound to a let variable is always evaluated. The dynamic semantics rules in the Formal Semantics do not formalize the more flexible evaluation
strategy above.

For example, in the following expression, the dynamic semantics rules of the Formal Semantics would raise a dynamic error because a path expression may not be applied to an atomic value. An implementation, however, may not raise an error, because the path expression is not necessary to evaluate the containing let expression.

let $x := 1/foobar return 1

However, the static semantic rules, by definition, are conservative, and therefore a type is computed for every subexpression. In the example above, a static type error would be raised because a path expression may be applied to an atomic value.

3.4 Concepts

[XPath/XQuery] is most generally used to process documents. The representation of a document is normatively defined in [Data Model]. The functions used to access documents and collections are normatively defined in [Functions and Operators].

3.4.1 Document Order

3.4.2 Atomization

Atomization converts an item sequence into a sequence of atomic values and is implemented by the fn:data function. Atomization is applied in contexts where an arbitrary sequence of items is used where a sequence of atomic values is expected.

3.4.3 Effective Boolean Value

If a sequence of items is encountered where a boolean value is expected, the item sequence's effective boolean value is used. The fn:boolean function returns the effective boolean value of an item sequence.

3.4.4 Input Sources

[XPath/XQuery] has a set of functions that provide access to input data. These functions are of particular importance because they provide a way in which an expression can reference a document or a collection of documents. The dynamic semantics of these three input functions are described in more detail in [Functions and Operators].

3.4.5 URI Literals

3.5 Types

3.5.1 Predefined Schema Types

All the built-in types of XML Schema are recognized by [XPath/XQuery]. In addition, [XPath/XQuery] recognizes the predefined types SectionXQ, SectionXQ and SectionXQ and the duration subtypes SectionXQ and SectionXQ. The representation of those types in the [XPath/XQuery] type system is given below.

[Definition: The following type definition of xs:anyType reflects the semantics of the Ur type from Schema in the [XPath/XQuery] type system.]

All of those primitive types derive from xdt:anyAtomicType. Note that the value space of each atomic type (such as xs:string) does not appear. The value space for each type is built-in and is as defined in [Schema Part 2].

The following example shows two atomic values. The first one is a value of type string containing "Database". The second one is an untyped value containing "Database". Both are using a string as content, but they have different type annotations.

Three XML Schema built-in derived types are derived by list, as follows. Note that those derive directly from xs:anySimpleType, since they are derived by list, and that their value space is defined using a "one or more" occurrence indicator.

Note that the type name xs:IDREFS derives from xs:anySimpleType, but not from xs:IDREF. As a consequence, calling the following three XQuery functions with the element a as a parameter succeeds for f1 and f2, but raises a type error for f3.

3.5.2 Typed Value and String Value

The typed value of a node is computed by the fn:data function, and the string value of a node is computed by the fn:string function, defined in [Functions and Operators]. The normative definitions of typed value and string value are defined in [Data Model].

3.5.3 SequenceType Syntax

Introduction

SequenceTypes can be used in [XPath/XQuery] to refer to a type imported from a schema (see [5 Modules and Prologs]). SequenceTypes are used to declare the types of function parameters and in several kinds of [XPath/XQuery] expressions.

The syntax of SequenceTypes is described by the following grammar productions.

The semantics of SequenceTypes is defined by means of normalization rules from SequenceTypes into types in the [XPath/XQuery] type system (See [2.3 The [XPath/XQuery] Type System]).

However, the [XPath/XQuery] type system not being part of the [XPath/XQuery] syntax, the SequenceType syntax is still part of the [XPath/XQuery] core. Normalization from SequenceTypes to types is not applied during the normalization phase but whenever a dynamic or static rule requires it. Normalization of SequenceTypes is the only example of normalization that does not yield an expression in the [XPath/XQuery] core and that occurs on-demand in dynamic or static rules.

3.5.4 SequenceType Matching

Introduction

During processing of a query, it is sometimes necessary to determine whether a given value matches a type that was declared using the SequenceType syntax. This process is known as SequenceType matching, and is formally specified in [8.3 Judgments for type matching].

Notation

To define normalization of SequenceTypes to the [XPath/XQuery] type system, the following auxiliary mapping rule is used.

An "element" SequenceType with a wildcard and a type name is normalized into a wildcard element type with a corresponding type name. The presence of a "?" after the type name indicates a nillable element.

An "element" SequenceType with only a name is normalized into a nillable element type with a corresponding name. The reason for the normalization to allow nillable elements is because the semantics of SequenceTypes in that case allows it to match every possible element with that names, regardless of its type or nilled property.

A "document-node" sequence type with an element test (resp. a schema element test) is normalized into the corresponding document type, whose content is the normalization of the element test (resp. schema element test), interleaved with an arbitrary sequence of processing instruction, comment, and text nodes.

The [XPath/XQuery] type system does not model the target of a processing-instruction, which is treated as a dynamic property. Therefore a "processing-instruction" SequenceType with a string parameger is normalized into an optional processing-instruction type.

For each expression, a short description and the relevant grammar productions are given. The semantics of an expression includes the normalization, static analysis, and dynamic evaluation phases. Recall that normalization rules translate [XPath/XQuery] syntax into Core syntax. In the sections that contain normalization rules, the Core grammar productions into which the expression is normalized are also provided. After normalization, sections on static type inference and dynamic evaluation define the
static type and dynamic value for the Core expression.

It is a static type error for most (but not all) expressions to have the empty type. The exceptions to this rule are the following expressions and functions:

The literal empty-sequence expression ()

The fn:data function and all functions in the fs namespace applied to the literal empty-sequence expression ()

Any function with return type empty

The reason the above expressions and functions are excluded is that they are typically part of the result of normalizing a larger user-level expression and are used to capture the semantics of the user-level expression when applied to the empty sequence.

The rule is written in this way (i.e., in the double negative), because for any expression such that no static type rule applies to the expression, a static type error is raised. That is, the absence of an applicable static rule indicates an error. For example, if an expression is not the empty-sequence expression but has the empty type, the above rule does not apply and therefore a static error is raised.

Example

The above rule can catch common mistakes, such as the misspelling of an element or attribute name or referencing of an element or attribute that does not exist. For instance, the following path expression

$x/title

raises a static error if the type of variable $x does not include any title children elements.

4.1 Primary Expressions

Primary expressions are the basic primitives of the language.They include literals, variables, function calls, and the parenthesized expressions.

All literals are Core expressions, therefore no normalization rules are required for literals. Predefined entity references and character references in strings are resolved to characters as part of parsing and therefore they do not occur in the Core grammar.

4.1.5 Function Calls

Introduction

A function call consists of a QName followed by a parenthesized list of zero or more expressions. In [XPath/XQuery], the actual argument to a function is called an argument and the formal argument of a function is called a parameter. We use the same terminology here.

Function Calls

Because [XPath/XQuery] implicitly converts the values of function arguments, a normalization step is required.

Notation

Normalization of function calls uses an auxiliary mapping []FunctionArgument(SequenceType) used to insert conversions of function arguments that depend only on the expected SequenceType of the corresponding parameters. It is defined as follows:

where PrototypicalValue has the following values for each possible SequenceType:

Editorial note

Todo: insert dummy prototypical values for each of the 44+4 types...

Note

The fs:convert-simple-operand function takes a PrototypicalValue, which is a value of the target type, to ensure that conversion to base types is possible even though types are not first class objects in [XPath/XQuery].

Core Grammar

The Core grammar production for function calls is:

Function Calls

Each argument expression in a function call is normalized to its corresponding Core expression by applying []FunctionArgument(SequenceType) for each argument with the expected SequenceType for the argument inserted.

Note that this normalization rule depends on the static environment containing function signatures and is the only normalization rule that depends on statEnv. Furthermore notice that the normalization is only well-defined when it is guaranteed that overloading is restricted to atomic types with the same quantifier. This is presently the case.

To typecheck a Core function call we first check in Section [7 Additional Semantics of Functions] if there is a specialized typing rule for the function, and, if so, use it. Otherwise, the function signatures matching the function name and arity are retrieved from the static environment. The type of each argument to the function must be a subtype of a type that is promotable to the corresponding function parameter type of the function; if the expected type is
a union of atomic types then this check is performed separately for each possibility.

The first rule bootstraps type checking of a function call: It expands the function's QName and then applies the function call rule for the expanded function call:

For a function call in which the static type of one of the expressions passed as argument is a union of atomic types, the function call is type checked once separately for each atomic type in that union. The static type of the entire function call expression is then the union of the types computed in each case, as follows:

Notice that this semantics makes sense since the type declared for a function parameter, which uses the sequence types syntax, cannot itself be a union.

Finally, the following auxilliary rule type checks a function call in which none of the actual arguments has a type that is a union of atomic types. The rule looks up the function in the static environment and checks that some signature for the function satisfies the following constraint: the type of each actual argument is a subtype of some type that can be promoted to the type of the correponding function parameter. In this case, the function call is well typed and the result type is the
return type specified in the function's signature.

The function body itself is not analyzed for each invocation: static typing of the function definition itself guarantees that the function body always returns a value of the declared return type.

Notice that the static typing rule checks the function signature in order to determine whether a function exists rather than just the function arity: this is consistent because it will reject function calls with the wrong arity in addition to function calls with the right arity but incompatible parameter types.

Based on a function's name and parameter types, the function body is retrieved from the dynamic environment.

If the function is a locally-declared, user-defined function then it is evaluated as follows. First, the rule evaluates each actual function argument expression. Next, a match is searched for among the function's possible declaration signatures, retrieved from statEnv.funcType. If the function is not present in the environment, or there is no matching declaration signature, a type error is raised. Otherwise, the function
body and formal variables are obtained from dynEnv.funcDefn for the matching declaration signature. The rule then extends dynEnv.varValue by binding each formal variable to its corresponding value (converted by the normalization as required for the expected type and backwards compatibility flag), and evaluates the body of the function in the new
environment. The resulting value is the value of the function call.

Note that the function body is evaluated in the default (top-level) environment extended with just the parameter bindings. Note also that input values and output values are matched against the types declared for the function. If static analysis was performed, all these checks are guaranteed to be true and may be omitted.

The rule for evaluating an imported function is similar to that for evaluating a locally declared function, except that the function call is evaluated in the dynamic context of the module in which it is declared.

If the evaluation of any actual argument raises an error, the function call can raise an error. This rule applies to both user-defined and built-in functions. Note that if more than one expression may raise an error, the function call may raise any one of the errors.

If, for all possible function signatures, the evaluation of some actual argument yields a value that cannot be promoted to the corresponding formal type of the parameter, the function call raises a type error. This rule applies to both user-defined and built-in functions.

If the evaluation of the function call to a built-in or external function yields a value that cannot be promoted to the corresponding return type of the function, the built-in or external function call raises a type error.

4.2 Path Expressions

Introduction

Path expressions are used to locate nodes within a tree. There are two kinds of path expressions, absolute path expressions and relative path expressions. An absolute path expression is a rooted relative path expression. A relative path expression is composed of a sequence of steps.

Absolute path expressions are path expressions starting with the / or // symbols, indicating that the expression must be applied on the root node in the current context. The root node in the current context is the greatest ancestor of the context node. The following two rules normalize absolute path expressions to relative ones. They use the fn:root function, which returns the greatest ancestor of its argument node. The treat expressions guarantee that the value
bound to the context variable $fs:dot is a document node.

A composite relative path expression (using /) is normalized into a for expression by concatenating the sequences obtained by mapping each node of the left-hand side in document order to the sequence it generates on the right-hand side. The call to the fs:distinct-doc-order function ensures that the result is in document order without duplicates. The dynamic context is defined by binding the
$fs:dot, $fs:sequence, $fs:position and $fs:last variables.

Note that sorting by document order enforces the restriction that input and output sequences contains only nodes, and that the last step in a path expression may actualy return atomic values.

Steps

Step expressions can be followed by predicates. Normalization of predicates uses the following auxiliary mapping rule: []Predicates, which is specified in [4.2.2 Predicates]. Normalization for step expressions also uses the following auxiliary mapping rule: []Axis, which is specified in [4.2.1.1 Axes].

As explained in the [XPath/XQuery] document, applying a step in XPath changes the focus (or context). The change of focus is made explicit by the normalization rule below, which binds the variable $fs:dot to the node currently being processed, and the variable $fs:position to the position (i.e., the position within the input sequence) of that node.

There are two sets of normalization rules for Predicates. The first set of rules apply when the predicate is a numeric literal or the expression last(). The second set of rules apply to all predicate expressions other than numeric literals and the expression last(). In the first case, the normalization rules provides a more precise static type than if the general rules were applied.

When the predicate expression is a numeric literal or the fn:last function, the following normalization rules apply.

The normalization rules above all use the function fn:subsequence to select a particular item. The static typing rules for this function are defined in [7.2.10 The fn:subsequence function].

When predicates are applied on a forward step, the input sequence is first sorted in document order and duplicates are removed. The context is changed by binding the $fs:dot variable to each node in document order.

When predicates are applied on a reverse step, the input sequence is first sorted in document order and duplicates are removed. The context is changed by binding the $fs:dot variable to each node in document order.

The static semantics of an Axis NodeTest pair is obtained by retrieving the type of the context node, and applying the two filters (the Axis, and then the NodeTest with a PrincipalNodeKind) on the result.

The dynamic semantics of an Axis NodeTest pair is obtained by retrieving the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the filter judgment as follows.

The semantics of the following(-sibling) and preceding(-sibling) axes are expressed by mapping them to Core expressions, all other axes are part of the Core and therefore are left unchanged through normalization.

Note that the semantics of predicates whose parameter is a numeric value also works for other numeric than integer values, in which case the op:numeric-equal returns false when compared to a position. For example the expression //a[3.4] is allowed and always returns the empty sequence)

4.2.3 Unabbreviated Syntax

The corresponding Section in the [XPath/XQuery] document just contains examples.

Filter Expression

When predicates are applied on a primary expression, the input sequence is processed in sequence order and the context is bound as in the case of forward axes. In that case, the sequence can contain both nodes and atomic values.

When predicates are applied on a primary expression, the input sequence is processed in sequence order and the context variable is bound to each item in the input sequence, which may contain both nodes and atomic values.

The normalization rules for all the arithmetic operators except idiv first atomize each argument by applying fn:data and then apply the internal function fs:convert-operand to each argument. If the first argument to this function has type xdt:untypedAtomic, then the first argument is cast to a double, otherwise it is returned unchanged. The overloaded internal function
corresponding to the arithmetic operator is then applied to the two converted arguments. The table above maps the operators to the corresponding internal function. The mapping from the overloaded internal functions to the corresponding monomorphic function is given in [B.2 Mapping of Overloaded Internal Functions].

4.5 Comparison Expressions

Introduction

Comparison expressions allow two values to be compared. [XPath/XQuery] provides four kinds of comparison expressions, called value comparisons, general comparisons, node comparisons, and order comparisons.

The normalization rules for the value comparison operators first atomize each argument by applying fn:data and then apply the internal function fs:convert-operand defined in [7.1.3 The fs:convert-operand function]. If the first argument to this function has type xdt:untypedAtomic, then the first argument is cast to a string, otherwise it is
returned unchanged. The overloaded internal function corresponding to the value comparison operator is then applied to the two converted arguments. The table above maps the value operators to the corresponding internal function. The mapping from the overloaded internal functions to the corresponding monomorphic function is given in [B.2 Mapping of Overloaded Internal Functions].

4.5.2 General Comparisons

Introduction

General comparisons are defined by adding existential semantics to value comparisons. The operands of a general comparison may be sequences of any length. The result of a general comparison is always true or false.

The normalization rule for a general comparison expression first atomizes each argument by applying fn:data and then applies the existentially quantified some expression to each sequence. The internal function fs:convert-operand is applied to each pair of atomic values. If the first argument to this function has type xdt:untypedAtomic, then the first argument is cast to type of the
second argument. If the second argument has type xdt:untypedAtomic, the first argument is cast to a string. The overloaded internal function corresponding to the general comparison operator is then applied to the two converted values.

4.5.3 Node Comparisons

The normalization rules for node comparisons map each argument expression and then apply the internal function corresponding to the node comparison operator. The internal function are defined in [B.2 Mapping of Overloaded Internal Functions].

The dynamic semantics of logical expressions is non-deterministic. This non-determinism permits implementations to use short-circuit evaluation strategies when evaluating logical expressions. In the expression, Expr1andExpr2, if either expression raises an error or evaluates to false, the entire expression may raise an error or evaluate to false. In the expression, Expr1orExpr2, if either expression raises an error or evaluates to true, the entire expression may raise an error or evaluate to true.

4.7 Constructors

[XPath/XQuery] supports two forms of constructors: a direct constructor, which supports literal XML syntax for elements, attributes, and text nodes, and a computed constructor, which can be used to construct element and attribute nodes, possibly with computed names, and also document and text nodes. All direct constructors are normalized into computed constructors, i.e., there are no direct-constructor expressions in the Core.

4.7.1 Direct Element Constructors

Introduction

The static and dynamic semantics of the direct forms of element and attribute constructors are specified on the equivalent computed element and attribute constructors.

We start with the rules for normalizing a direct element constructor's content. We distinguish between direct element constructors that contain only one element-content unit and those that contain more than one element-content unit. An element-content unit is a contiguous sequence of literal characters (character references, escaped braces, and predefined entity references), one enclosed expression, one direct element constructor, one XML comment, or one XML processing instruction. Here are three direct
element constructors that each contain one element-content unit:

It contains one XML comment, followed by one enclosed expression that contains the integer 123, one contiguous sequence of characters ("-0A "), one direct XML element constructor, one contiguous sequence of characters (" Flushing, NY"), and one enclosed expression that contains the integer 11368. Evaluating this expression yields this element value:

Adjacent element-content units are convenient because they permit arbitrary interleaving of text and atomic data. During evaluation, atomic values are converted to text nodes containing the string representations of the atomic values, and then adjacent text nodes are concatenated together. In the example above, the integer 123 is converted to a string and concatenated with "-0A" and the result is a single text node containing "123-0A".

In general, we do not want to convert all atomic values to text nodes, especially when performing static-type analysis, because we lose useful type information. For example, if we normalize the first example above as follows, we lose the important information that the user constructed a date value, not just a text node containing an arbitrary string:

So to preserve useful type information, we distinguish between direct element constructor's that contain one element-content unit and those that contain more than one (because multiple element-content units commonly denote concatenatation of atomic data and text). Here is the normalization of the first and fourth examples above:

Given the distinction between direct element constructors that we made above, we give two normalization rules for a direct element constructor's content. If the direct element constructor contains exactly one element-content unit, we simply normalize that unit by applying the normalization rule for the element content:

If the direct element constructor contains more than one element-content unit, we normalize each unit individually and construct a sequence of the normalized results interleaved with empty text nodes. The empty text nodes guarantee that the results of evaluating consecutive element-content units can be distinguished. Then we apply the function fs:item-sequence-to-node-sequence. Section
3.7.1 Direct Element ConstructorsXQ specifies the rules for converting a sequence of atomic values and nodes into a sequence of nodes before element construction. The Formal Semantics function fs:item-sequence-to-node-sequence implements these conversion rules.

We need to distinguish between multiple element-content units, because the rule for converting sequences of atomic values into strings apply to sequences within distinct enclosed expressions. The empty text nodes are eliminated during evaluation of fs:item-sequence-to-node-sequence when consecutive text nodes are coalesced into a single text node. The text node guarantees that a whitespace character will not be inserted between atomic values
computed by distinct enclosed expressions. For example, here is an expression, its normalization, and the resulting XML value:

Now that we have explained the normalization rules for direct element content, we give the rules for the two forms of direct XML element constructors. Note that the direct attribute constructors are normalized twice: the []NamespaceAttrs normalizes the namespace-declaration attributes and []Attribute normalizes all other attributes that are not namespace-declaration attributes.

that character references have been resolved to individual characters and predefined entity references have been resolved to sequences of characters, and

that the rule is applied to the longest contiguous sequence of characters.

The following normalization rule takes the longest consecutive sequence of individual characters that include literal characters, escaped curly braces, character references, and predefined entity references and normalizes the character sequence as a text node containing the string of characters..

As with literal XML elements, we need to distinguish between direct attribute constructors that contain one attribute-content unit and those that contain multiple attribute-content units, because the rule for converting sequences of atomic values into strings are applied to sequences within distinct enclosed expressions. If the direct attribute constructor contains exactly one attribute-content unit, we simply normalize that unit by applying the normalization rule for the
attribute content:

If the direct attribute constructor contains more than one attribute-content unit, we normalize each unit individually and construct a sequence of the normalized results interleaved with empty text nodes. The empty text nodes guarantee that the results of evaluating consecutive attribute-content units can be distinguished. Then we apply the function fs:item-sequence-to-untypedAtomic, which applies the appropriate
conversion rules to the normalized attribute content:

Literal characters, escaped curly braces, character references, and predefined entity references in attribute content are treated as in element content. In addition, the normalization rule for characters in attributes assumes:

that an escaped single or double quote is converted to an individual single or double quote.

The following normalization rules take the longest consecutive sequence of individual characters that include literal characters, escaped curly braces, escaped quotes, character references, predefined entity references, and escaped single and double quotes and normalizes the character sequence as a string.

4.7.1.2 Namespace Declaration Attributes

Direct attributes may contain namespace-declaration attributes. The normalization rules for namespace-declaration attributes ignore all non-namespace attributes -- they are handled by the normalization rules in [4.7.1.1 Attributes].

An AttributeList containing namespace-declaration attributes is normalized by the following rule, which maps each of the individual namespace-declaration attributes in the attribute list and constructs a sequence of the normalized namespace attribute values.

4.7.1.3 Content

4.7.1.4 Whitespace in Element Content

Section 3.7.1.4 Whitespace in Element ContentXQ describes how whitespace in element and attribute constructors is processed depending on the value of the xmlspace declaration in the query prolog. the Formal Semantics assumes that the rules for handling whitespace are applied prior to normalization rules, for example, during parsing of a query. Therefore, there are no formal rules for handling whitespace.

4.7.3.1 Computed Element Constructors

Local namespace declarations may occur explicitly in a computed element constructor or may be the result of normalizing namespace-declaration attributes contained in direct element constructors. For local element declarations that occur explicitly in a query, the immediately enclosing expression of the local namespace declaration (CompElemNamespace) must be a computed element constructor; otherwise, as specified in [XPath/XQuery], a static error is raised.

When the name of an element is computed, the normalization rule applies atomization, and checkes that the result of atomization is a single atomic value either of type xs:QName, a xs:string, or xdt:untypedAtomic. If the name expression returns a value of type xs:string or xdt:untypedAtomic, that value is cast to a QName. The resulting expanded QName is used as the name
for the constructed element.

The normalization rules of direct element and attribute constructors leave us with only the computed forms of constructors. The static and dynamic semantic rules are defined on all the computed forms. The computed element constructor itself has two forms: one in which the element name is a literal QName, and the other in which the element name is a computed expression.

We start with the static rule for an element constructor with a computed name expression, because it is the simplest rule. Because the element's name cannot be known until runtime, the element is given the wildcard type, element of type xs:anyType. The computed name expression must have type xs:QName and the content expression must have a type of zero-or-more attributes followed by zero-or-more element, text, comment, or processing-instruction nodes. Note that a local namespace
declaration has the empty type and therefore does not effect the type of the element's content.

The following rules take a computed element constructor expression and construct an element node. The dynamic semantics for computed element constructors is the most complex of all expressions in XQuery. Here is how to read the rule below.

First, the element's content expression is partitioned into the local namespace declarations and all other expressions, and the local namespace declarations are evaluated, yielding a sequence of namespace annotations. The static environment is extended to include the new namespace annotations, which are all active. In Section 3.7.1.2 Namespace Declaration AttributesXQ, it is
implementation-defined whether undeclaration of namespace prefixes (by setting the namespace prefix to the empty string) in an element constructor is supported. In the dynamic semantics below, we assume all local namespace declarations declare a binding of a prefix to a URI.

Second, the function fs:item-sequence-to-node-sequence is applied to the element's content expression (excluding local namespace declarations); this function call is evaluated in the new static and dynamic environment. Recall from [4.7.1 Direct Element Constructors] that during normalization, we do not convert the content of direct element constructors that contain one element-content unit. This
guarantees that useful type information is preserved for static analysis. Since the conversion function fs:item-sequence-to-node-sequence was not applied to all element constructors during normalization, we have to apply it at evaluation time. (Obviously, it is possible to elide the application of fs:item-sequence-to-node-sequence injected during normalization and the application injected during
evaluation.) The resulting value Value0 must match zero-or-more attributes followed by zero-or-more element, text, processing-instruction or comment nodes.

Third, The namespace annotations are concatenated with the list of active namespaces in the namespace environment statEnv.namespace and the namespaces corresponding to the element's name and all attributes names. The resulting sequence is the sequence of namespace annotations for the element.

The default rules for propogating errors, described in [3.3 Error Handling] apply to element constructors. In addition, a computed element constructor with a computed name raises a type error if the name value is not a xs:QName.

Both forms of computed element constructors raise a type error if the element's content is not a sequence of attributes followed by a sequence of element, text, comment, and processing-instruction nodes, or a sequence of atomic values.

The normalization rules for direct attribute constructors leave us with only the computed form of the attribute constructors. Like a computed element constructors, a computed attribute constructor has two forms: one in which the attribute name is a literal QName, and the other in which the attribute name is a computed expression.

We start with the static rule for an attribute constructor with a computed name expression, because it is the simplest rule. The computed name expression must have type xs:QName. The result type is an attribute of type xs:anySimpleType.

As in element constructors, the static rules are liberal when a single xdt:untypedAtomic content expression is provided as an argument and conservative, otherwise.

If the content expression is a sequence of expressions all of which are xdt:untypedAtomic, we apply a liberal static rule (i.e., assume the validation will succeed). Note that the static type of an attribute expression is always attributeQNameof typexdt:untypedAtomic, even though more precise static typing information mioght be available.

The following rules take a computed attribute constructor expression and construct an attribute node. The rules are similar to those rules for element constructors. First, the attribute's name is expanded into a qualified name. Second, the function fs:item-sequence-to-untypedAtomic is applied to the content expression and this function call is evaluated in the dynamic environment. Recall from [4.7.3.2 Computed Attribute Constructors] that during normalization, we do not convert the content of direct attribute constructors that contain one attribute-content unit. This guarantees that useful type information is preserved for static analysis. Since the conversion function fs:item-sequence-to-untypedAtomic was not applied to all attribute constructors during normalization, we have to apply it
at evaluation time. (As before, it is possible to elide the application of fs:item-sequence-to-untypedAtomic injected during normalization and the application injected during evaluation.)

The default rules for propogating errors, described in [3.3 Error Handling] apply to attribute constructors. In addition, an attribute constructor with a computed name raises a type error if the name value is not a xs:QName. the xmlns namespace.

The static semantics checks that the type of the argument expression is a sequence of element, text, processing-instruction, and comment nodes. The type of the entire expression is the most general document type, because the document constructor erases all type annotations on its content nodes.

The dynamic semantics checks that the argument expression evaluates to a value that is a sequence of element, text, processing-instruction, or comment nodes. The entire expression evaluates to a new document node value. Note that the type annotations for all the nodes in content of a document node are eliminated; the erases to judgment performs this erasure.

The default rules for propogating errors, described in [3.3 Error Handling] apply to document node constructors. In addition, if the argument expression evaluates to a value that is not a sequence of element, text, processing-instruction, or comment nodes, a type error is raised.

4.7.3.4 Text Nodes Constructors

A text node constructor contains an expression, which must evaluate to an xs:string value. Section 3.7.3.4 Text Node ConstructorsXQ specifies the rules for converting a sequence of atomic values into a string prior to construction of a text node. Each node is replaced by its string value. For each adjacent sequence of one or more atomic values returned by an enclosed expression, a untyped atomic value is
constructed, containing the canonical lexical representation of all the atomic values, with a single blank character inserted between adjacent values. As formal specification of these conversion rules is not instructive, [7.1.8 The fs:item-sequence-to-untypedAtomic function] implements this conversion.

The static semantics checks that the argument expression has type xs:string. The type of the entire expression is an zero-or-one text type. The type is zero-or-one, because no text node is constructed if the argument of the text node constructor is the empty string.

The default rules for propogating errors, described in [3.3 Error Handling] apply to text node constructors. In addition, if the argument expression evaluates to a value that is not a string, a type error is raised.

The default rules for propogating errors, described in [3.3 Error Handling] apply to computed processing-instruction constructors. The normalization rules guarantee that a dynamic error is raised if the target expression is not a string or a QName.

A local namespace declaration may only occur within a computed element constructor. The result of evaluating a local namespace declaration is a namespace annotation, which annotates the element thats results from evaluating the containing computed element constructor. Because the local namespace declaration has no effect on the type of the element that it annotates, it is given the empty-sequence type.

The default rules for propogating errors, described in [3.3 Error Handling] apply to computed element namespaces.

4.8 [For/FLWR] Expressions

Introduction

[XPath/XQuery] provides [For/FLWR] expressions for iteration, for binding variables to intermediate results, and filtering bound variables according to a predicate.

A FLWORExpr in XQuery 1.0 consists of a sequence of ForClauses and LetClauses, followed by an optional WhereClause, followed by the , as described by the following grammar productions. Each variable binding is preceded by an optional type declaration which specify the type expected for the variable.

The dynamic semantics of the ordering mode in FLWOR expressions is not specified formally. The dynamic semantics is not specified formally as it would require the introduction of tuples, which are not supported in the [XPath/XQuery] data model.

Normalized FLWOR expressions restrict a For and Let clause to bind only one variable. Otherwise, the Core FLWOR expression is the same as the XQuery FLWOR expression.

Notation

The auxiliary rule []FLWOR(Expr) normalizes a For, Let, or Where clause in a FLWORExpr expression. Note that the rule takes the remainder of the FLWOR expression (other For, Let, or Where clauses and the Return clause) as a parameter in Expr.

The [For/FLWR] expressions include the FLWORExpr of XQuery and the ForExpr of XPath. The normalization rule for ForExpr is simple: It simply unrolls a ForExpr that binds multiple variables into nested ForExprs, each of which bind one variable.

Full FLWORExpr expressions are normalized to nested core expressions using two sets of normalization rules. Note that some of the rules also accept ungrammatical FLWORExprs such as "where Expr1 return Expr2". This does not matter, as normalization is always applied on parsed [XPath/XQuery] expressions, and ungrammatical FLWORExprs would be rejected by the parser beforehand.

The first set of rules is applied on a full [For/FLWR] expression, splitting it at the clause level, then applying further normalization on each separate clause.

Then each [For/FLWR] clause is normalized separately. A ForClause may bind more than one variable, whereas a For expression in the [XPath/XQuery] Core binds and iterates over only one variable. Therefore, a ForClause is normalized to nested for expressions:

The following simple example illustrates, how a FLWORExpr is normalized. The for expression in the example below is used to iterate over two collections, binding variables $i and $j to items in these collections. It uses a let clause to binds the local variable $k to the sum of both numbers, and a where clause to select only those numbers that have a sum equal to or greater than the integer 5.

For each binding of $i to an item in the sequence (1 , 2) the inner for expression iterates over the sequence (3 , 4) to produce tuples ordered by the ordering of the outer sequence and then by the ordering of the inner sequence. This core expression eventually results in the following document fragment:

4.8.2 For expression

A single for expression is typed as follows: First Type1 of the iteration expression Expr1 is inferred. Then the prime type of Type1, prime(Type1), is computed. This is a union over all item types in
Type1 (See [8.4 Judgments for FLWOR and other expressions on sequences]). With the variable component of the static environment statEnv extended with VarRef1 as type prime(Type1),
the type Type2 of Expr2 is inferred. Because the for expression iterates over the result of Expr1, the final type of the iteration is Type2 multiplied with the possible number of items in Type1 (one,
?, *, or +). This number is determined by the auxiliary type-function quantifier(Type1).

When a type declaration is present, the static semantics also checks that the type of the input expression is a subtype of the declared type and extends the static environment by typing VarRef1 with type Type0. This semantics is specified by the following typing rule.

The last rule contains a For expression that contains a type declaration and a positional variable. When the positional variable is present, the static environment is also extended with the positional variable typed as an integer.

This result-type is not the most specific type possible. It does not take into account the order of elements in the input type, and it ignores the individual and overall number of elements in the input type. The most specific type possible is: element out {element one {}}, element out {element two {}}, element out {element three {}}. However, inferring such a specific type for arbitrary input types and arbitrary return clauses requires significantly more complex type inference rules. In
addition, if put into the context of an element, the specific type violates the "unique particle attribution" restriction of XML schema, which requires that an element must have a unique content model within a particular context.

Otherwise, the iteration expression Expr1, is evaluated to produce the sequence Item1, ..., Itemn. For each item Itemi in this sequence, the body of the for expression Expr2 is evaluated in the
environment dynEnv extended with VarRef1 bound to Itemi. This produces values Valuei, ..., Valuen which are concatenated to produce the result sequence.

The following rule is the same as the rule above, but includes the optional positional variable VarRefpos. If present, VarRefpos is bound to the position of the item in the input sequence, i.e., the value i.

When a type declaration is present, the dynamic semantics also checks that each item in the result of evaluating Expr1 matches the declared type. This semantics is specified by the following dynamic rule.

If evaluation of the first expression raises an error, the entire expression raises an error. This rule applies to all forms of a For expression, i.e., those with or without a type declaration or positional variable.

When a type declaration is present, the static semantics also checks that the type of the input expression is a subtype of the declared type and extends the static environment by typing Variable1 with type Type0. This semantics is specified by the following static rule.

The default rules for propogating errors, described in [3.3 Error Handling] apply to let expressions. In addition, in the case that a type declaration is present, a type error is raised if the result of evaluating Expr1 does not match the declared type.

Note the use of the environment discipline to define the scope of each variable. For instance, in the following nested let expression:

let $k := 5 return
let $k := $k + 1 return
$k+1

the outermost let expression binds variable $k to the integer 5 in the environment, then the expression $k+1 is computed, yielding value 6, to which the second variable $k is bound. The expression then results in the final integer 7.

4.8.4 Order By and Return Clauses

Introduction

The dynamic semantics of the OrderByClause is not specified formally. The dynamic semantics is not specified formally as it would require the introduction of tuples, which are not supported in the [XPath/XQuery] data model. The dynamic semantics of the order-by clause can be found in Section 3.8.3 Order By and Return ClausesXQ.

Because an OrderByClause does not effect the type of a FLWORExpr expression, the static semantics of a FLWORExpr expression with an OrderByClause is equivalent to the static semantics of an equivalent FLWORExpr in which the OrderByClause is omitted but a gt comparison is applied.

Notation

To define normalization of OrderBy, the following auxiliary mapping rule is used.

An OrderByClause is normalized to a Let clause, nested For expressions, and atomization, which guarantees that the OrderSpecList is well typed. Note that if evaluated dynamically, the normalization of OrderByClause given here does not express the required sorting semantics, but this normalization does provide the correct static type.

4.9 Ordered and Unordered Expressions

Introduction

The purpose of ordered and unordered expressions is to set the ordering mode in the static context to ordered or unordered for a certain region in a query. The specified ordering mode applies to the expression nested inside the curly braces.

OrderedExpr and UnorderedExpr expressions only have an effect on the static context. The effect on the evaluation of its subexpression(s) is captured using the fs:apply-ordering-mode function, which introduced during normalization of axis steps, union, intersect, and except expressions, and FLWOR expressions that have no order by clause.

If the conditional's boolean expression Expr1 evaluates to true, Expr2 is evaluated and its value is produced. If the conditional's boolean expression evaluates to false, Expr3 is evaluated and its value is produced. Note that the existence of two separate evaluation rules ensures that only one branch of the conditional is evaluated.

If the conditional's boolean expression Expr1 evaluates to true, and Expr2 raises an error, then the conditional expression raises an error. Conversely, if the conditional's boolean expression evaluates to false, and Expr3 raises an error, then the conditional raises an error.

The existentially quantified "some" expression yields true if any evaluation of the satisfies expression yields true. The existentially quantified "some" expression yields false if every evaluation of the satisfies expression is false. A quantified expression may raise an error if any evaluation of the satisfies expression raises an error. The dynamic semantics of quantified expressions is non-deterministic. This non-determinism permits implementations to use short-circuit evaluation strategies when
evaluating quantified expressions.

The universally quantified "every" expression yields false if any evaluation of the satisfies expression yields false. The universally quantified "every" expression yields true if every evaluation of the satisfies expression is true.

4.12 Expressions on SequenceTypes

Introduction

Expressions on SequenceTypes are expressions whose semantics depends on the type of some of the sub-expressions to which they are applied. The syntax of SequenceType expressions is described in [3.5.3 SequenceType Syntax].

An InstanceofExpr expression is normalized into a TypeswitchExpr expression. Note that the following normalization rule uses a variable $fs:new, which is a newly created variable which must not conflict with any variables already in scope. This variable is necessary to comply with the syntax of typeswitch expressions in the core [XPath/XQuery], but is never used.

The typeswitch expression chooses one of several expressions to evaluate based on the dynamic type of an input value.

Each branch of a typeswitch expression may have an optional VarRef, which is bound the value of the input expression. This variable is optional in [XPath/XQuery] but mandatory in the [XPath/XQuery] core. One of the reasons for having this variable is that it is assigned a specific type for the corresponding branch.

Normalization of a typeswitch expression guarantees that every branch has an associated VarRef. The following normalization rule adds a newly created variable that does not appear in the rest of the query. Note that $fs:new is a newly generated variable that must not conflict with any variables already in scope and that is not used in any of the sub-expressions.

The static typing rules for the typeswitch expression are simple. Each case clause and the default clause of the typeswitch is typed independently. The type of the entire typeswitch expression is the union of the types of all the clauses.

To type one case clause, the case variable is assigned the type of the case clause CaseType and the body of the clause is typed in the extended environment. Thus, the type of a case clause is independent of the type of the input expression.

The evaluation of a typeswitch proceeds as follows. First, the input expression is evaluated, yielding an input value. The effective case is the first case clause such that the input value matches the SequenceType in the case clause. The return clause of the effective case is evaluated and the value of the return expression is the value of the typeswitch expression.

The default rules for propogating errors, described in [3.3 Error Handling] apply to the typeswitch expression. In particular, if evaluation of the input expression or evaluation of any case rule raises an error, then the typeswitch expression raises an error.

The expression "( Expr ) cast as AtomicType " is used to change the atomic type of an atomic value from one atomic type to another. It changes both the type and value of the result of an expression, and can only be applied to an atomic value.

The semantics of cast expressions depends on the specification given in Section 17 CastingFO. For any source and target primitive types, the casting table in Section 17 CastingFO indicates whether the cast from the source type to the target type is permitted. When a cast is permitted, the detailed dynamic rules for cast in
Section 17 CastingFO are applied. These rules are not specified further here.

The normalization of cast applies atomization to its argument. The type declaration asserts that the result is a single atomic value. The second normalization rule applies when the target type is optional.

Note that in the case that the casting table indicates "M", the casting operation is allowed but might fail during evaluation if the input value does not satsify the lexical and value constraints of the target atomic type (e.g., attempting to cast the string "VRAI" into xs:boolean). In that case, the dynamic evaluation raises a dynamic error.

The default rules for propogating errors, described in [3.3 Error Handling] apply to the cast expression. In particular, if Expr1 raises an error, then the cast expression raises an error. In addition, if the casting table returns "N", the cast is not allowed, and the dynamic semantics raises a type error.

4.12.6 Treat

The expression "Expr treat as SequenceType", can be used to change the static type of the result of an expression without changing its value. The treat-as expression raises a dynamic error if the dynamic type of the input value does not match the specified type.

Treat as expressions are normalized to typeswitch expressions. Note that the following normalization rule uses a variable $fs:new, which is a newly created variable that does not conflict with any variables already in scope.

A validate expression validates its argument with respect to the in-scope schema definitions, using the schema validation process described in [Schema Part 1]. The argument to a validate expression must be either element or document node. Validation replaces all nodes with new nodes that have their own identity and contain type annotations and defaults created by the validation process.

Static typing of the validate operation is defined by the following rule. Note the use of a subtyping check to ensure that the type of the expression to validate is either an element or a well-formed document node (i.e., with only one root element and no text nodes). The type of the expression to validate may be a union of more than one element type. We apply the with mode judgment to each element type to determine the meaning of that
element type with the given validation mode, which yields a new element type. The result type is the union over all new element types.

The above steps are expressed formally by the "erasure" and "annotation" judgments. Formally, validation removes existing type annotations from nodes ("erasure"), and it re-validates the corresponding data model instance, possibly adding new type annotations to nodes ("annotation"). Both erasure and annotation are described formally in [Missing Reference : sec_validation_judgments]. Indeed, the conjunction of erasure and annotation provides a formal
model for a large part of actual schema validation. The semantics of the validate expression is specified as follows.

In the first premise below, the expression to validate is evaluated. The resulting value must be an element or document node. The second premise constructs a new value in which all existing type annotations have been erased. The third premise determines the element type that corresponds to the element node's name in the given validation mode. The last premise validates erased element node with the type against which it is validated, using the annotate
as judgment, yielding the final validated element.

The QName of a pragma must resolve to a namespace URI and local name, using the statically known namespaces. If at least one of the pragmas is recognized, the dynamic semantics is implementation-defined.

If at least one of the pragmas is recognized, the static semantics are implementation-defined.

If none of the pragmas are recognized, the static semantics are the same as for the input expression. In both cases, the static typing must be applied on the input expression, possibly raising corresponding static type errors.

The Prolog is a sequence of declarations and definitions that effect query processing. The Prolog can be used, for example, to define namespace prefixes, import type definitions from XML Schemas, and define functions and variables. Namespace declarations and schema imports always precede function definitions, as specified by the following grammar productions.

Function declarations are globally scoped, that is, the use of a function name in a function call may precede declaration of the function. Variable declarations are lexically scoped, i.e., variable declarations must precede variable uses.

The XQuery Prolog requires that declarations appear in a particular order. In the Formal Semantics, it is simpler to assume the declarations can appear in any order, as it does not change their semantics -- we simply assume that an XQuery parser has enforced the required order.

The Prolog contains a variety of declarations that specify the initial static and dynamic context of the query. The following formal grammar productions represent any Prolog declaration.

The following auxiliary judgments are applied when processing the declarations in the prolog. The effect of the judgment is to process each prolog declaration in order, constructing a new static environment from the static environment constructed from previous prolog declarations.

Prolog declarations are processed in the order they are encountered. The normalization of a prolog declaration PrologDecl depends on the static context processing of all previous prolog declarations. In turn, static context processing of PrologDecl depends on the normalization of the PrologDecl. For example, because variables are lexically scoped, the normalization and static context processing of a variable declaration depends on the normalization and static context
processing of all previous variable declarations. Therefore, the normalization phase and static context processing are interleaved, with normalization preceding static context processing for each prolog declaration.

The following inference rules express this dependency. The first rule specifies that for an empty sequence of prolog declarations, the initial static environment is the default static context.

Static typing of a main module follows context processing and normalization. Context processing and normalization of a main module applies the rules above to the prolog, then using the resulting static environment statEnv, the query body is normalized into a Core expression, and the static typing rules are applied to this Core expression.

Dynamic evalution of a main module applies the rules for dynamic-context processing to the prolog declarations, then using the resulting dynamic environment dynEnv, the dynamic evaluation rules are applied to the normalized query body.

A version declaration specifies the applicable XQuery syntax and semantics for a module. An XQuery implementation must raise a static error when processing a query labeled with a version that the implementation does not support. The Formal Semantics is specified for XQuery 1.0 and does not specify this static error formally.

5.2 Module Declaration

We assume that the static-context processing and dynamic-context processing described in [5 Modules and Prologs] are applied to all library modules before the normalization, static context processing, and dynamic context processing of the main module. That is, at the time an "import module" declaration is processed, we assume that the static and dynamic context of the imported module is already available. This assumption does not require or assume separate
compilation of modules. An implementation might process all or some imported modules statically (i.e., before the importing module is identified) or dynamically (i.e., when the importing module is identified and processed).

Notation

We define a new judgment that maps a module's URI to the corresponding module's static environment:

The effect of a module declaration is to apply the static processing rules defined in [5 Modules and Prologs] to the module's prolog. The resulting static context is then available to any importing module.

The module declaration extends the prolog with a namespace declaration that binds the module's prefix to its URI, then computes the static context for the complete module.

The dynamic context processing of a module declaration is similar to that of static processing. The module declaration extends the prolog with a namespace declaration that binds the module's prefix to its URI, then computes the dynamic context for the complete module.

5.3 Boundary-space Declaration

The xmlspace declaration is not specified formally as the Formal Semantics is defined on the Core language, which is an abstract, not concrete, syntax and is typically the result of parsing phase described in [2.4.1 Processing model].

The default collation declaration updates the collations environment of the static context. The collations environment is used by several functions in [Functions and Operators], but is otherwise not used in the Formal Semantics.

A base URI declaration specifies the base URI property of the static context, which is used when resolving relative URIs within a module. A static error is raised if more than one base URI declaration is declared in a query prolog.

Schema Imports

The semantics of Schema Import is described in terms of the [XPath/XQuery] type system. The process of converting an XML Schema into a sequence of type declarations is described in Section [Missing Reference : sec_importing_schema]. This section describes how the resulting sequence of type declarations is added into the static context when the Prolog is processed.

A schema imported into a query is first mapped into the [XPath/XQuery] type system, which yields a sequence of XQuery type definitions. The rules for mapping the imported schema begins in [E.2 Schemas as a whole]. Each type definition in an imported schema is then added to the static environment.

Note that it is a static error to import two schemas that both define the same name in the same symbol space and in the same scope, that is multiple top-level definitions of the same type, element, or attribute name raises a static error. For instance, a query may not import two schemas that include top-level element declarations for two elements with the same expanded name.

The function fs:local-functions(statEnv, URI) returns all the function signatures in statEnv.funcType such that the URI part of the function's expanded-QName equals the given URI, that is, the function signatures that are declared locally in the module with the given namespace URI.

During static context processing, the effect of an "import module" declaration is to extend the importing module's static context with the global variables (and their types) and the function signatures of the imported module. Module import is not transitive, therefore only the global variables and functions declared explicitly in the imported module are available in the importing module. Also, module import does not import schemas, therefore the importing module must explicitly import any schemas on
which the imported global variables or functions depend.

The first premise below "looks up" the static context of the imported module, as defined in [5.2 Module Declaration], then extends the input static context with the global variables and function signatures declared in the imported static context.

During dynamic context processing, the effect of an "import module" declaration is to extend the importing module's dynamic context with the global variables and the function definitions of the imported module. Each variable and function name is mapped to the special value #IMPORTED(URI) to indicate that the variable or function is defined in the imported module with the given URI.

Notation

The rules below depend on the following auxilliary judgments.

This judgment adds each variable explicitly declared in the imported module to the importing module's dynamic variable environment.

A namespace declaration adds a new (prefix,uri) binding in the namespace component of the static environment. All namespace declarations in the prolog are passive declarations. Namespace declaration attributes of element constructors are active declarations.

A default element namespace declaration changes the default element namespace prefix binding in the namespace component of the static environment. If the string literal is the empty string, the default element namespace is set to the empty namespace, otherwise it is set to the string literal value.

Note that multiple declarations of the same namespace prefix in the Prolog result in a static error. However, a declaration of a namespace in the Prolog can override a prefix that has been predeclared in the static context.

A variable declaration updates the variable component of the static context by associating the given variable with a static type. If the variable declaration has a type declaration, the static type of the variable is simply the specified type.

5.15 Function Declaration

Introduction

User-defined functions specify the name of the function, the names and types of the parameters, and the type of the result. The function body defines how the result of the function is computed from its parameters.

Because functions are mutually referential, all function signatures must be defined in the static environment before static type analysis is applied to the function bodies. This rule also updates the local functions component of the static context to indicate the function is declared withtin the given module.

The static typing rules for function bodies follows normalization and processing of the static context. The typing rules below constructs a new environment in which each variable has the given expected type, then the static type of the function's body is computed under the new environment. The function body's type must be a subtype of the expected return type. If type checking fails, a static error is raised. Otherwise, static typing of the function has no other effect, as function signatures are
already inside the static environment.

The bodies of external functions are not available and therefore cannot by type checked. To ensure type soundness, the evaluation environment must guarantee that the value returned by the external function matches the expected return type.

A function declaration updates the dynamic context. The function name with arity N is associated with the given function body. The number of arguments is required, because XQuery permits overloading of function names as long as each function signature has a different number of arguments.

5.16 Option Declaration

6 Conformance

The XQuery Formal Semantics is intended primarily as a component that can be used by [XQuery 1.0: A Query Language for XML], or a host language of [XML Path Language (XPath) 2.0]. Therefore, the XQuery Formal Semantics relies on specifications that use it (such as [XPath 2.0], [XSLT 2.0], and [XQuery]) to specify conformance criteria in their respective environments. Specifications that set conformance criteria for their use of the formal semantics must not
relax the constraints expressed in this specification.

6.1.1 Static Typing Extensions

In some cases, the static typing rules are not very precise (see, for example, the type inference rules for the ancestor axes—parent, ancestor, and ancestor-or-self—and for the function fn:root). If an implementation supports a static typing extension, it must always provide a more precise type than the one defined in this specification.

This constraint is formally expressed as follows. A static typing extension Expr:extType must be such that for every expression Expr the following holds.

It is not recommended for a static typing extension to change the static typing behavior of expressions that specify a type explicitely (treat as, cast as, typeswitch, function parameters, and type declarations in variable bindings), since the purpose of those expressions is to impose a specific type.

7 Additional Semantics of Functions

This section defines a number of auxilliary functions required to define the formal semantics of [XPath/XQuery], and gives special static typing rules for some functions in [Functions and Operators].

7.1 Formal Semantics Functions

Introduction

This section gives the definition and semantics of functions that are used in the formal semantics but are not in [Functions and Operators]. Their dynamic semantics are defined in the same informal style as in the [Functions and Operators] document. The static semantics of some formal-semantics functions require custom typing rules.

If statEnv.xpath1.0_compatibility is true, and $actual is not of numeric type, then returns the value of the expression fn:number(fn:subsequence($actual,1,1)).

Editorial note

New Issue: According to Michael Kay this converts too much (for example dates) to numbers in backwards compatibility mode. One fix would be to make the test inclusive yet sufficiently general to capture all the actual 1.0 cases, i.e., say something like "$actual : (node* | xdt:untypedAtomic* | xs:string*)". [Kris]

In XPath 1.0 backwards compatibility mode, if the static type of the operand is not a single numeric, the static type of the expression is the static type of the expression that extracts the first element of the operand and then converts that item to a double (by applying fn:number).

fs:convert-simple-operand($actual as item *, $expected as xdt:anyAtomicType) as xdt:anyAtomicTypeAtomic *

The formal-semantics function fs:convert-simple-operand is used to convert the value of the $actual argument such that it matches the type of the $expected argument (or matches a sequence of that type).

The dynamic semantics of this function are as follows:

If statEnv.xpath1.0_compatibility is true and the $expected argument is of type xs:string (or a type derived from xs:string) but the $actual argument is not of type xs:string (or a type derived from xs:string), then returns the value of fn:string($actual).

If statEnv.xpath1.0_compatibility is true and the $expected argument is of numeric type (or a type derived from a numeric type) but the $actual argument is not of numeric type, then returns the value of fn:number($actual).

If statEnv.xpath1.0_compatibility is false, then for each item in $actual argument that is of type xdt:untypedAtomic, that item is cast to the type of the $expected argument, and the resulting sequence is returned.

The fs:distinct-doc-order-or-atomic-sequence function operates on either an homogeneous sequence of nodes or an homogeneous sequence of atomic values. If the input is a sequence of nodes, is sorts those nodes by document order and removes duplicates, using the fs:distinct-doc-order function. If it is a sequence of atomic values, it returns it unchanged.

In summary, each node is replaced by its string value. For each adjacent sequence of one or more atomic values returned by an enclosed expression, a string is constructed, containing the canonical lexical representation of all the atomic values, with a single blank character inserted between adjacent values.

If the ordering context is set to unordered, the static type of the input expression of the fs:apply-ordering-mode function is computed using the prime and quantifier judgments, as for the fn:unordered function.

7.2 Standard functions with specific typing rules

Introduction

This section gives static typing rules for functions in [Functions and Operators] for which the standard typing rule based on the function's signature can be made more precise. All functions that are not mentioned here have static semantics as described by the generic static rule of section [4.1.5 Function Calls]. The rules in this section always give more precise type information than the generic rule.

The typing rules for the fn:abs, fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even functions promote their input type to the (least) base primitive numeric type from which the input type is derived. Parameters of type xdt:untypedAtomic are always promoted to xs:double. Instead of writing a separate judgment for each function, we write one rule with function variable
F, which is one of the fn:abs, fn:ceiling, fn:floor, fn:round, or fn:round-half-to-even functions.

When applied to the union of two types, data on is applied to each of the two types. The resulting types are combined into a factored type. This rule is necessary because data on may return a sequence of atomic types.

7.2.7 The fn:min, fn:max, fn:avg, and fn:sum functions

The dynamic evaluation rules for the fn:min, fn:max, and fn:avg convert any item of type xdt:untypedAtomic in the input sequence to xs:double, then attempt to promote all values in the input sequence to values that are comparable. The static typing rules reflect the dynamic rules.

The function aggregate_quantifier converts the input type quantifier zero-or-more or zero-or-one to the result type quantifier zero-or-one, and converts the input type quantifier one or one-or-more, to the result type quantifier one.

Now we can define the static typing rules for the aggregate functions. First, the input type is converted to a prime type. Second, the type function convert_untypedAtomic is applied to the prime type, yielding a new prime type, in which occurrences of xdt:untypedAtomic are converted to xs:double. Third, the judgment can be
promoted to is applied to the new prime type and a target type T. The result type is T combined with the aggregate quantifier of the input type.

Instead of writing a separate judgment for each function and target type, we write one rule with function variable F and target variable T. When the function variable F is fn:min or fn:max, the target type T must be one of xs:string, xs:integer, xs:decimal, xs:float, xs:double, xdt:yearMonthDuration, or xdt:dayTimeDuration . When the function variable F is fn:avg, the target type T must be one of xs:decimal, xs:float, xs:double, xdt:yearMonthDuration, or xdt:dayTimeDuration .

The static typing rules for fn:sum are similar to those for the aggregate functions above, in which T must be one of xs:integer, xs:decimal, xs:float, xs:double, xdt:yearMonthDuration, or xdt:dayTimeDuration .

The fn:sum function has two forms. The first form takes two arguments: The first argument is the input sequence and the second argument is the value that should be returned if input sequence is empty. In this case, the result type is the union of the target type T and the type of the second argument.

7.2.10 The fn:subsequence function

The fn:subsequence function has special typing rules when its second argument is the numeric literal value 1 or the built-in variable $fs:last. These rules provide better typing for path expressions such as Expr[1] and Expr[fn:last()].

The static semantics of op:except follows. The type of the second argument is ignored as it does not contribute to the result type. As with op:intersect, the result of op:except may be the empty sequence.

The functions fn:zero-or-one, fn:one-or-more, and fn:exactly-one check that the cardinality of a sequence is in the expected range. They are useful to override the static type inferred for a given query. For example, in the following query, the user may know that all ISBN numbers are unique and therefore that the function always return at most one book element. However, the static typing feature cannot infers a precise enough type and will return a static type
error at compile time.

8 Auxiliary Judgments

This section defines auxiliary judgments used in defining the formal semantics. Many auxiliary judgments are used in both static and dynamic inference rules. Those auxiliary judgments that are used in only the static or dynamic semantics are labeled as such.

The above rules all require that the type names be defined in the static context, but [XPath/XQuery] permits references to "unknown" type names, i.e., type names that are not defined in the static context. An unknown type name might be encountered, if a module in which the given type name occurs does not import the schema in which the given type name is defined. In this case, an implementation is allowed (but is not required) to provide an implementation-dependent mechanism for determining whether the
unknown type name is the same as or derived by restriction from the expected type name. The following rule formalizes this implementation dependent mechanism.

"The implementation is able to determine that TypeName1 is derived by restriction from TypeName2."

8.1.3 Element and attribute name lookup (Dynamic)

The name lookup judgment is used in the definition of the matches judgment, which takes a value and a type and determines whether the value matches, or is an instance of, the given type. Both name lookup and matches are used in the dynamic semantics.

The name lookup judgment takes an element(attribute) name (derived from a node value) and an element(attribute) type and if the element(attribute) name matches the corresponding name in the element(attribute) type, the judgment yields the type's corresponding type reference and for elements, its nillable property.

Note that when the element name is in a substitution group, the name lookup returns the type name corresponding to the original element name (here the type NYCAddress for the element nycaddress, instead of Address for the element address).

Semantics

This judgment is specified by the following rules.

If the element type is a reference to a global element, then name lookup yields the type reference in the element declaration for the given element name. The given element name must be in the substitution group of the global element.