Introduction

What is PsychohPath? PsychoPath is a XPath 2.0 XML Schema Aware processor. It is nearly fully compliant to the XPath 2.0 test suite. It is a library that does not require eclipse to be used. Known adopters of PsychoPath include the Xerces-J project for XML Schemas 1.1 assertion support.

PsychoPath is the only known open-source java XPath 2.0 processor that is fully schema aware. SAXON HE only supports the core functionality. XML Schema awarness provides tighter static checking, and can be used to help determine if certain operations can or should occur on an XML node. For a detailed description of the PsychoPath's design please see the design document.

Getting PsychoPath

Currently there is no standalone build of PsychoPath. However, you can download the WTP WST 3.1 zip file, and use the org.eclipse.wst.xml.xpath2.processor.jar file. This jar has no dependencies on eclipse, and will work as a standard jar file. If you are using an OSGI container, then this is treated as a standard OSGI bundle.

Additional dependencies you currently need are:

Apache Commons Lang.

IBM ICU

Xerces 2.8.0

JavaCup 0.10 or greater.

If using eclipse, these are all available from the Orbit project. Others can find the necessary jars from their respective project pages.

How to feed Psychopath XPath expressions

Since PsychoPath has been implemented as an external library and not as a complete program, in order to use it, it needs to be accessed from inside another program. To process XPath 2.0 expressions using PsychoPath from another programs one needs to go through the following process:

Load the XML document

Optionally validate the XML document

Initialize static and dynamic context in respect to the document root

Parse the XPath 2.0 expression

Statically verify the XPath 2.0 expression

Evaluate the XPath 2.0 expression in respect to the XML document

To give a better idea of how this process actually works, we’ll go through an example of processing and evaluating the string expression “Hello World!”. In this example the XML document that we load is called “XPexample.xml”.

All 5 main steps have been explained in detail in User Interface, so below is just a brief code summary:

XPath 2.0 defines everything to be a sequence of items, including the arguments to expressions and the result of operations. Thus, the overall result of an XPath expression evaluation is also a sequence of items. PsychoPath uses the class ResultSequence as a Collections wrapper to store these sequences and therefore, the result of an evaluation is of this type also. The ResultSequence consists of zero or more items; an item may be a node or a simple-value. “Hello World!” is an example of a single value with length 1. A general sequence could be written as (“a”, “s”, “d”, “f”).

Extraction of certain items from the ResultSequence class is described below, with details of the different operations that one might apply on the ResultSequence. Consider that ’rs’ is the ResultSequence, then:

// Will return the number of elements in the sequence, in this // case of ’Hello World!’ expression size = 1. rs.size();// Will return the n’th element in the sequence, in this case of // ’Hello World!’, if n = 1, then it will return// XSString of “Hello World!”, but if n = 2, it will return // Empty Result.
rs.get(n);//Will return true if the sequence is empty.
rs.empty();// Will return the first element of the sequence, // in this example it will return XSString of “Hello World!”
rs.first()

However, all the items extracted will be of the type’s base class AnyType and need to be casted into its actual subtype.

Certain operations always return a particular type and using this knowledge, the extracted item can be immediately casted. In our example “Hello World!” returns a string (easily known as it is inside the quotes ’ ’ ), so this can safely be casted as such:

XSString xsstring =(XSString)(rs.first());

The actual result can now be extracted from this XSString in the following manner:

String str = xsstring.value();

The details of how to cast extracted items from AnyType into their actual subtypes with examples is in the next section on How to use each production in the grammar.

However, if the expected return type is unknown or multiple types are possible, the types hierarchy can be traversed in a breadth first manner making use of the Java instanceof operator to ascertain the actual type.

How to use the XPath 2.0 grammar with PsychoPath

In this section we will try to give you an overview of the XPath 2.0 grammar in general and how each production in the grammar should be used with PsychoPath. For the formal specifications, see the W3C web-site for XPath 2.0 specification http://www.w3.org/TR/xpath20.

Constants

String literals are written as “Hello” or ‘Hello’. In each case the opposite kind of quotation mark can be used within the string: ‘He said “Hello” ’ or “London is a big city”. To feed PsychoPath, “ ‘Hello World!’ ”or “ “Hello World!” ” can be used to feed it with strings. Remember that the ResultSequence returns AnyType so since a string is being expected as the result, first it has to be casted in the code like this:

XSString xsstring =(XSString)(rs.first());

Numeric constants follow the Java rules for decimal literals: for example, 4 or 4.67; a negative number can be written as -3.05. The numeric literal is taken as a double precision floating point number if it uses scientific notation (e.g. 1.0e7), as a fixed point decimal if it includes a decimal point, or as an integer otherwise. When extracting number literals from the ResultSequence, possible types to be returned include XSDecimal (e.g. : xs:decimal: 4.67),XSInteger (e.g. : xs:integer: 4) or XSDouble (e.g. : xs:double 1e0). All of which need to be casted in the same manner as stated before: from AnyType to their corresponding types.

There are no boolean constants as such: true, false instead the function calls true() and false() are used.

Constants of other data types can be written using constructors. These look like function calls but require a string literal as their argument. For example, xs:float(“10.7”) produces a single-precision floating point number.

Path expressions

A path expression is a sequence of steps separated by the / or // operator. For example, ../@desc selects the desc attribute of the parent of the context node.

In XPath 2.0, path expressions have been generalized so that any expression can be used as an operand of /, (both on the left and the right), as long as its value is a sequence of nodes. For example, it is possible to use a union expression (in parentheses) or a call to the id() function.

In practice, it only makes sense to use expressions on the right of "/" if they depend on the context item. It is legal to write $x/$y provided both $x and $y are sequences of nodes, but the result is exactly the same as writing ./$y.

Note that the expressions ./$X or $X/. can be used to remove duplicates from $X and sort the results into document order.

The operator // is an abbreviation for /descendant-or-self::node(). An expression of the form /E is shorthand for root(.)/E, and the expression / on its own is shorthand for root(.).

Axis steps

The basic primitive for accessing a source document is the axis step. Axis steps may be combined into path expressions using the path operators "/" and "//", and they may be filtered using filter expressions in the same way as the result of any other expression.

An axis step has the basic form axis::node-test, and selects nodes on a given axis that satisfy the node-test. The axes available are:

element: age

element: age

The rest of the axes act in the same manner.

Set difference, intersection and Union

The expression E1 except E2 selects all nodes that are in E1 unless they are also in E2. Both expressions must return sequences of nodes. The results are returned in document order. For example, @* except @note returns all attributes except the note attribute. The expression E1 intersect E2 selects all nodes that are in both E1 and E2. Both expressions must return sequences of nodes. The results are returned in document order. The expression E1 union E2 selects all nodes that are in either E1 or E2 or both. Both expressions must return sequences of nodes. The results are returned in document order. A complete example of the above expression would be as follows. Consider an XML document which looks like this:

Arithmetic Expressions

Unary

minus and plus: The unary minus operator changes the sign of a number. For example -1 is minus one, and -1e0 is the double value negative -1.

Multiplication and Division:

The operator * multiplies two numbers. If the operands are of different types, XPath 2.0 specifications say that one of them is promoted to the type of the other. The result is the same type as the operands after promotion.

The operator div divides two numbers. Dividing two integers produces a double; in other cases the result is the same type as the operands.

The operator idiv performs integer division. For example, the result of 10 idiv 3 is 3.

The mod operator returns the modulus (or remainder) after division.

The operators * and div may also be used to multiply or divide a range by a number.

Addition and Subtraction:

The operators + and - perform addition and subtraction of numbers, in the usual way. Once again, if the operands are of different types, XPath 2.0 specifications say one of them is promoted but numeric type promotion is currently unsupported by PsychoPath. The result is of the same type as the operands.

Examples of above would be:

-(5 + 7)

result:

xs:integer: -12

-xs:float(’1.23’)

result:

xs:float: -1.23

-xs:double(’1.23’)

result:

xs:double: -1.23

(+5 - +7)

result:

xs:integer: -2

(1 to 5 div 0 )

result:

FAIL (division by zero!)

5*6*10*5*96 div 20 div 3 div 1

result:

xs:decimal: 2400.0

31 mod 15

result:

xs:integer: 1

Range expressions

The expression E1 to E2 returns a sequence of integers. For example, 1 to 5 returns the sequence 1, 2, 3, 4, 5. This is useful in for expressions, for example the first five nodes of a node sequence can be processed by writing for $i in 1 to 5 return (//x)[$i]. Another example:

(1+1 to 10)

result:

xs:integer: 2

xs:integer: 3

xs:integer: 4

xs:integer: 5

xs:integer: 6

xs:integer: 7

xs:integer: 8

xs:integer: 9

xs:integer: 10

Comparisons

The simplest comparison operators are eq, ne, lt, le, gt, ge. These compare two atomic values of the same type, for example two integers, two dates, or two strings. (Collation hasn’t been implemented in current version of PsychoPath). If the operands are not atomic values, an error is raised.

The operators =, !=, <=, >, <, and >= can compare arbitrary sequences. The result is true if any pair of items from the two sequences has the specified relationship, for example $A = $B is true if there is an item in $A that is equal to some item in $B.

The operators is and isnot test whether the operands represent the same (identical) node. For example, title[1] is *[@note][1] is true if the first title child is the first child element that has a @note attribute. If either operand is an empty sequence the result is an empty sequence (which will usually be treated as false).

The operators << and >> test whether one node precedes or follows another in document order. Consider this XML document:

Conditional Expressions

XPath 2.0 allows a conditional expression of the form if ( E1 ) then E2 else E3. For example, if (@discount) then @discount else 0 returns the value of the discount attribute if it is present, or zero otherwise.

Quantified Expressions

The expression some $x in E1 satisfies E2 returns true if there is an item in the sequence E1 for which the effective boolean value of E2 is true. Note that E2 must use the range variable $x to refer to the item being tested; it does not become the context item. For example, some $x in @* satisfies $x eq "" is true if the context item is an element that has at least one zero-length attribute value.

Similarly, the expression every $x in E1 satisfies E2 returns true if every item in the sequence given by E1 satisfies the condition.

And, Or Expressions

The expression E1 and E2 returns true if the effective boolean values of E1 and E2 are both true. The expression E1 or E2 returns true if the effective boolean values of either or both of E1 and E2 are true.

Example: (for a truth table)

1 and 1

result:

xs:boolean: true

1 and 0

result:

xs:boolean: false

1 or 0

result:

xs:boolean: true

0 or 1

result:

xs:boolean: true

SequenceType Matching Expressions

The rules for SequenceType matching compare the actual type of a value with an expected type. These rules are a subset of the formal rules that match a value with an expected type defined in XQuery 1.0 and XPath 2.0 Formal Semantics http://www.w3.org/TR/xpath20/#XQueryFormalSemantics, because the Formal Semantics must be able to match a value with any XML Schema type, whereas the rules below only match values against those types expressible by the SequenceType syntax.

Some of the rules for SequenceType matching require determining whether a given type name is the same as or derived from an expected type name. The given type name may be "known" (defined in the in-scope schema definitions), or "unknown" (not defined in the in-scope schema definitions). An unknown type name might be encountered, for example, if a source document has been validated using a schema that was not imported into the static context. In this case, an implementation is allowed (but is not required) to provide an implementation-dependent mechanism for determining whether the unknown type name is derived from the expected type name. For example, an implementation might maintain a data dictionary containing information about type hierarchies. consider the following XML document:

<sorbo><is>elite</is><!-- life sux --></sorbo>

then, the following are some example of SequenceType matchings:

element({*})

result:

element: sorbo

element(elite)

result:

Empty results

sorbo/comment()

result:

comment: life sux

data(/sorbo/comment())

result:

xs:string: life sux

sorbo/node()

result:

text:

element: is

comment: life sux

text:

How to use XPath 2.0 functions with PsychoPath

The aim of this section is to give the user an overview of the available XPath 2.0 functions that are implemented in PsychoPath. For the formal specifications, see the W3C web-site for XPath 2.0 functions and operators http://www.w3.org/TR/xpath-functions/.

Accessors

In order for PsychoPath to operate on instances of the XPath 2.0 data model, the model must expose the properties of the items it contains. It does this by defining a family of accessor functions. These functions are not available to users or applications to call directly. Instead, they are descriptions of the information that an implementation of the model must expose to applications.

Example

data(‘string’)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

String n =((XSString)rs.first()).stringvalue(); println(n);

in order to get the result of ‘string’

The Error and Trace Functions

Constructor Functions

Example

xs:dateTime("2002-02-01T10:00:00+06:00")

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

String n =((XSDateTime)rs.first()).stringvalue(); println(n);

in order to get the result of ‘2002-02-01T04:00:00Z’

Functions on Numeric Values

Example

ceiling(xs:float(‘10.4’))

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

float n =((XSFloat)rs.first()).floatvalue(); println(n);

in order to get the result of ‘11.0’

Functions to Assemble and Disassemble Strings

Example

codepoints-to-string(0111)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

String n =((XSString)rs.first()).stringvalue(); println(n);

in order to get the result of ‘o’

Compare and Other Functions on String Values

Example

concat(‘un’, ‘grateful’)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

String n =((XSString)rs.first()).stringvalue(); println(n);

in order to get the result of ‘ungrateful’

Functions Based on Substring Matching

Example

contains("abc", "edf")

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

boolean n =((XSBoolean)rs.first()).value(); println(n);

in order to get the result of ‘false’

String Functions that Use Pattern Matching

Example

matches(‘abcd’, ‘abcd’)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

boolean n =((XSBoolean)rs.first()).value(); println(n);

in order to get the result of ‘true’

Functions on Boolean Values

Example

not(true())

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

boolean n =((XSBoolean)rs.first()).value();
println(n);

in order to get the result of ‘false’

Component Extraction Functions on Durations, Dates and Times

Example

timezone-from-time(xs:time("13:20:00+05:00"))

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

Example

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

double avg =((XSDouble)rs.first()).doublevalue(); println(avg);

in order to get the result of ‘4.0’

Context Functions

Example

(10 to 20)[position() = 2]

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

int pos =((XSInteger)rs.first()).intvalue(); println(pos);

in order to get the result of ‘11’

How to use XPath 2.0 operators with PsychoPath

The aim of this section is to give the user an overview of the available XPath 2.0 operators that are implemented in PsychoPath. For the formal specifications, see the W3C web-site for XPath 2.0 functions and operators http://www.w3.org/TR/xpath-functions/.

Operators on Numeric Values

Example

xs:integer(4) + xs:integer(3)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

Integer n =((XSInteger)rs.first()).integervalue(); println(n);

in order to get the result of ‘7’

Comparison of Numeric Values

Example

xs:decimal(3.3) = xs:decimal(6.6)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

boolean n =((XSBoolean)rs.first()).value(); println(n);

in order to get the result of ‘false’

Operators on Boolean Values

Example

xs:boolean(’true’) gt xs:boolean(’false’)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

boolean n = ((XSBoolean)rs.first()).value(); println(n);

in order to get the result of ‘true’

Comparisons of Duration, Date and Time Values

Example

xs:time("23:00:00+06:00") lt xs:time("12:00:00-06:00")

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

boolean n =((XSBoolean)rs.first()).value(); println(n);

in order to get the result of ‘true’

Arithmetic Functions on Durations

Example

multiply-dayTimeDuration(xdt:dayTimeDuration("PT2H10M"), 2.1)

from within a Java application, in order to extract the result from the result sequence, one would have to use this code:

String n =((XDTDayTimeDuration)rs.first()).stringvalue(); println(n);

which returns a xdt:dayTimeDuration value corresponding to 4 hours and 33 minutes ‘PT4H33M’