From PostgreSQL 8.3 on,
there is XML-related functionality based on the SQL/XML
standard in the core server. That functionality covers XML
syntax checking and XPath queries, which is what this module
does, and more, but the API is not at all compatible. It is
planned that this module will be removed in a future version of
PostgreSQL in favor of the newer standard API, so you are
encouraged to try converting your applications. If you find
that some of the functionality of this module is not available
in an adequate form with the newer API, please explain your
issue to <pgsql-hackers@postgresql.org>
so that the deficiency can be addressed.

Table F-32
shows the functions provided by this module. These functions
provide straightforward XML parsing and XPath queries. All
arguments are of type text, so for
brevity that is not shown.

Table F-32. Functions

Function

Returns

Description

xml_is_well_formed(document)

bool

This parses the document text in its parameter and
returns true if the document is well-formed XML.
(Note: before PostgreSQL 8.2, this function was
called xml_valid().
That is the wrong name since validity and
well-formedness have different meanings in XML. The
old name is still available, but is deprecated.)

xpath_string(document,
query)

text

These functions evaluate the XPath query on the
supplied document, and cast the result to the
specified type.

xpath_number(document,
query)

float4

xpath_bool(document,
query)

bool

xpath_nodeset(document,
query, toptag, itemtag)

text

This evaluates query on document and wraps the
result in XML tags. If the result is multivalued, the
output will look like:

<toptag>
<itemtag>Value 1 which could be an XML fragment</itemtag>
<itemtag>Value 2....</itemtag>
</toptag>

If either toptag or
itemtag is an empty string,
the relevant tag is omitted.

xpath_nodeset(document,
query)

text

Like xpath_nodeset(document, query, toptag,
itemtag) but result omits both tags.

xpath_nodeset(document,
query, itemtag)

text

Like xpath_nodeset(document, query, toptag,
itemtag) but result omits toptag.

xpath_list(document, query,
separator)

text

This function returns multiple values separated by
the specified separator, for example Value 1,Value 2,Value 3 if separator
is ,.

xpath_list(document,
query)

text

This is a wrapper for the above function that uses
, as the separator.

xpath_table is a table
function that evaluates a set of XPath queries on each of a set
of documents and returns the results as a table. The primary
key field from the original document table is returned as the
first column of the result so that the result set can readily
be used in joins. The parameters are described in Table F-33.

Table F-33. xpath_table Parameters

Parameter

Description

key

the name of the "key"
field — this is just a field to be used as the first
column of the output table, i.e., it identifies the
record from which each output row came (see note
below about multiple values)

document

the name of the field containing the XML
document

relation

the name of the table or view containing the
documents

xpaths

one or more XPath expressions, separated by
|

criteria

the contents of the WHERE clause. This cannot be
omitted, so use true or
1=1 if you want to process
all the rows in the relation

These parameters (except the XPath strings) are just
substituted into a plain SQL SELECT statement, so you have some
flexibility — the statement is

SELECT <key>, <document>
FROM <relation> WHERE <criteria>

so those parameters can be anything valid in those
particular locations. The result from this SELECT needs to
return exactly two columns (which it will unless you try to
list multiple fields for key or document). Beware that this
simplistic approach requires that you validate any
user-supplied values to avoid SQL injection attacks.

The function has to be used in a FROM expression, with an AS clause to specify the output columns; for
example

The AS clause defines the names and
types of the columns in the output table. The first is the
"key" field and the rest correspond
to the XPath queries. If there are more XPath queries than
result columns, the extra queries will be ignored. If there are
more result columns than XPath queries, the extra columns will
be NULL.

Notice that this example defines the page_count result column as an integer. The
function deals internally with string representations, so when
you say you want an integer in the output, it will take the
string representation of the XPath result and use PostgreSQL
input functions to transform it into an integer (or whatever
type the AS clause requests). An error
will result if it can't do this — for example if the result is
empty — so you may wish to just stick to text as the column type if you think your data has
any problems.

The calling SELECT statement
doesn't necessarily have be just SELECT
* — it can reference the output columns by name or join
them to other tables. The function produces a virtual table
with which you can perform any operation you wish (e.g.
aggregation, joining, sorting etc). So we could also have:

The xpath_table function
assumes that the results of each XPath query might be
multivalued, so the number of rows returned by the function
may not be the same as the number of input documents. The
first row returned contains the first result from each query,
the second row the second result from each query. If one of
the queries has fewer values than the others, null values
will be returned instead.

In some cases, a user will know that a given XPath query
will return only a single result (perhaps a unique document
identifier) — if used alongside an XPath query returning
multiple results, the single-valued result will appear only
on the first row of the result. The solution to this is to
use the key field as part of a join against a simpler XPath
query. As an example:

This function applies the XSL stylesheet to the document
and returns the transformed result. The paramlist is a list of parameter assignments
to be used in the transformation, specified in the form
a=1,b=2. Note that the parameter
parsing is very simple-minded: parameter values cannot
contain commas!

There is also a two-parameter version of xslt_process which does not pass any
parameters to the transformation.