The xml data type can be used to store
XML data. Its advantage over storing XML data in a text field is that it checks the input values for
well-formedness, and there are support functions to perform
type-safe operations on it; see Section 9.14. Use of this data type
requires the installation to have been built with configure --with-libxml.

The xml type can store well-formed
"documents", as defined by the XML
standard, as well as "content"
fragments, which are defined by the production XMLDecl? content in the XML standard. Roughly,
this means that content fragments can have more than one
top-level element or character node. The expression xmlvalue IS
DOCUMENT can be used to evaluate whether a particular
xml value is a full document or only a
content fragment.

While this is the only way to convert character strings into
XML values according to the SQL standard, the
PostgreSQL-specific syntaxes:

xml '<foo>bar</foo>'
'<foo>bar</foo>'::xml

can also be used.

The xml type does not validate input
values against a document type declaration (DTD), even when the
input value specifies a DTD. There is also currently no
built-in support for validating against other XML schema
languages such as XML Schema.

The inverse operation, producing a character string value
from xml, uses the function xmlserialize:

XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )

type can be character, character varying,
or text (or an alias for one of those).
Again, according to the SQL standard, this is the only way to
convert between type xml and character
types, but PostgreSQL also allows you to simply cast the
value.

When a character string value is cast to or from type
xml without going through XMLPARSE or XMLSERIALIZE,
respectively, the choice of DOCUMENT
versus CONTENT is determined by the
"XML option" session configuration
parameter, which can be set using the standard command:

SET XML OPTION { DOCUMENT | CONTENT };

or the more PostgreSQL-like syntax

SET xmloption TO { DOCUMENT | CONTENT };

The default is CONTENT, so all
forms of XML data are allowed.

Note: With the default XML option setting, you
cannot directly cast character strings to type xml if they contain a document type
declaration, because the definition of XML content fragment
does not accept them. If you need to do that, either use
XMLPARSE or change the XML
option.

Care must be taken when dealing with multiple character
encodings on the client, server, and in the XML data passed
through them. When using the text mode to pass queries to the
server and query results to the client (which is the normal
mode), PostgreSQL converts all character data passed between
the client and the server and vice versa to the character
encoding of the respective end; see Section 23.3. This includes string
representations of XML values, such as in the above examples.
This would ordinarily mean that encoding declarations contained
in XML data can become invalid as the character data is
converted to other encodings while traveling between client and
server, because the embedded encoding declaration is not
changed. To cope with this behavior, encoding declarations
contained in character strings presented for input to the
xml type are ignored, and content is assumed
to be in the current server encoding. Consequently, for correct
processing, character strings of XML data must be sent from the
client in the current client encoding. It is the responsibility
of the client to either convert documents to the current client
encoding before sending them to the server, or to adjust the
client encoding appropriately. On output, values of type
xml will not have an encoding
declaration, and clients should assume all data is in the
current client encoding.

When using binary mode to pass query parameters to the
server and query results back to the client, no encoding
conversion is performed, so the situation is different. In this
case, an encoding declaration in the XML data will be observed,
and if it is absent, the data will be assumed to be in UTF-8
(as required by the XML standard; note that PostgreSQL does not
support UTF-16). On output, data will have an encoding
declaration specifying the client encoding, unless the client
encoding is UTF-8, in which case it will be omitted.

Needless to say, processing XML data with PostgreSQL will be
less error-prone and more efficient if the XML data encoding,
client encoding, and server encoding are the same. Since XML
data is internally processed in UTF-8, computations will be
most efficient if the server encoding is also UTF-8.

Caution

Some XML-related functions may not work at all on
non-ASCII data when the server encoding is not UTF-8.
This is known to be an issue for xpath() in particular.

The xml data type is unusual in that
it does not provide any comparison operators. This is because
there is no well-defined and universally useful comparison
algorithm for XML data. One consequence of this is that you
cannot retrieve rows by comparing an xml
column against a search value. XML values should therefore
typically be accompanied by a separate key field such as an ID.
An alternative solution for comparing XML values is to convert
them to character strings first, but note that character string
comparison has little to do with a useful XML comparison
method.

Since there are no comparison operators for the xml data type, it is not possible to create an
index directly on a column of this type. If speedy searches in
XML data are desired, possible workarounds include casting the
expression to a character string type and indexing that, or
indexing an XPath expression. Of course, the actual query would
have to be adjusted to search by the indexed expression.

The text-search functionality in PostgreSQL can also be used
to speed up full-document searches of XML data. The necessary
preprocessing support is, however, not yet available in the
PostgreSQL distribution.

Submit correction

If you see anything in the documentation that is not correct, does not match
your experience with the particular feature or requires further clarification,
please use
this form
to report a documentation issue.