On Thu, Jul 1, 2010 at 12:25 PM, Mike Fowler <mike(at)mlfowler(dot)com> wrote:
> Quoting Mike Fowler <mike(at)mlfowler(dot)com>:
>
>> Should the IS DOCUMENT predicate support this? At the moment you get
>> the following:
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns>'
>> IS
>> DOCUMENT;
>> ?column?
>> ----------
>> t
>> (1 row)
>>
>> template1=# SELECT
>>
>> '<towns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns'
>> IS
>> DOCUMENT;
>> ERROR: invalid XML content
>> LINE 1: SELECT '<towns><town>Bidford-on-Avon</town><town>Cwmbran</to...
>> ^
>> DETAIL: Entity: line 1: parser error : expected '>'
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>> ^
>> Entity: line 1: parser error : chunk is not well balanced
>>
>> owns><town>Bidford-on-Avon</town><town>Cwmbran</town><town>Bristol</town></towns
>>
>> ^
>> I would've hoped the second would've returned 'f' rather than failing.
>> I've had a glance at the XML/SQL standard and I don't see anything in
>> the detail of the predicate (8.2) that would specifically prohibit us
>> from changing this behavior, unless the common rule 'Parsing a string
>> as an XML value' (10.16) must always be in force. I'm no standard
>> expert, but IMHO this would be an acceptable change to improve
>> usability. What do others think?
>
> Right, I've answered my own question whilst sitting in the open source
> coding session at CHAR(10). Yes, IS DOCUMENT should return false for a
> non-well formed document, and indeed is coded to do such. However, the
> conversion to the xml type which happens before the underlying
> xml_is_document function is even called fails and exceptions out. I'll work
> on a patch to resolve this behavior such that IS DOCUMENT will give you the
> missing 'xml_is_well_formed' function.
I think the point if "IS DOCUMENT" is to distinguish a document:
<foo>some stuff<bar/><baz/></foo>
from a document fragment:
<bar/><baz/>
A document is allowed only one toplevel tag.
It'd be nice, I think, to have a function that tells you whether
something is legal XML without throwing an error if it isn't, but I
suspect that should be a separate function, rather than trying to jam
it into "IS DOCUMENT".
http://developer.postgresql.org/pgdocs/postgres/functions-xml.html#AEN15187
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company