Hello,
I'm new to the list and tried to find the answer to my questions in
several locations (including the list archive). So I apologize if I
haven't searched thoroughly enough and the anwer has been given already.
Here's my problem:
We use a standoff annotation format for storing multiple annotated text
files. The text files are used for defining a:span elements which
delimit the textual information annotated by means of start and end
positions (see example below).
The annotation is stored separately as children of the a:data element.
In principle, everything is allowed underneath the a:data element (in
the underlying XSD 'a.xsd' the a:data element is a wrapper for elements
derived from a different namespace), however, there won't be any text
nodes, only elements containing other elements or empty elements. So I
won't have any information about the hierarchy of the children of a:data.
Connection between annotation and the annotated text is saved by the
a:span attributes (which is declared as xs:IDREF in the XSD).
<a:collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.example.org/a a.xsd"
xmlns="http://www.example.org/a" xmlns:a="http://www.example.org/a">
<a:entry xml:id="c1" type="text">
<a:spans>
<a:span xml:id="seg1" start="0" end="20"/>
<a:span xml:id="seg2" start="0" end="20"/>
<a:span xml:id="to1" start="0" end="4"/>
<a:span xml:id="to2" start="5" end="8"/>
</a:spans>
<a:data xmlns:b="http://www.example.org/b"
xsi:schemaLocation="http://www.example.org/b b.xsd">
<b:text a:span="seg1">
<b:para a:span="seg1"/>
</b:text>
</a:data>
<a:data xmlns:c="http://www.example.org/c"
xsi:schemaLocation="http://www.example.org/c c.xsd">
<c:sentence id="w35" a:span="seg2">
<c:word a:span="to1" id="w36"/>
<c:word a:span="to2" id="w37"/>
<!-- ... -->
</c:sentence>
</a:data>
</a:entry>
</a:collection>
When I try to use an XQuery to subsum all annotation that corresponds to
a specific a:span element with the following XQuery example, I receive
the output below.
declare namespace a="http://www.example.org/a";
declare namespace b="http://www.example.org/b";
declare namespace c="http://www.example.org/c";
element resultset
{
let $d := doc('instance.xml')
for $s in $d/a:collection/a:entry/a:spans/a:span
return
<result span="{$s/@xml:id}" start="{$s/@start}" end="{$s/@end}">
{ $d/a:collection/a:entry/a:data//*[@a:span = $s/@xml:id] }
</result>
}
<resultset>
<result start="0" end="20" span="seg1">
<b:text xmlns:b="http://www.example.org/b"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.example.org/a"
xmlns:a="http://www.example.org/a"
a:span="seg1">
<b:para a:span="seg1"/>
</b:text>
<b:para xmlns:b="http://www.example.org/b"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.example.org/a"
xmlns:a="http://www.example.org/a"
a:span="seg1"/>
</result>
<result start="0" end="20" span="seg2">
<c:sentence xmlns:c="http://www.example.org/c"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.example.org/a"
xmlns:a="http://www.example.org/a"
id="w35"
a:span="seg2">
<c:word a:span="to1" id="w36"/>
<c:word a:span="to2" id="w37"/>
<!-- ... -->
</c:sentence>
</result>
<result start="0" end="4" span="to1">
<c:word xmlns:c="http://www.example.org/c"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.example.org/a"
xmlns:a="http://www.example.org/a" a:span="to1"
id="w36"/>
</result>
<result start="5" end="8" span="to2">
<c:word xmlns:c="http://www.example.org/c"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.example.org/a"
xmlns:a="http://www.example.org/a" a:span="to2"
id="w37"/>
</result>
</resultset>
Several things are not perfect here:
- Is there any way to suppress the output of the namespaces in each
element? Or to be more specific: what do I have to change to output all
namespaces once (and only once) in the resultset element?
- The biggest issue is that the b:para element is output twice: as child
element of the b:text element (which is quite fine) and alone. The same
problem appears when looking at the c:word elements: they should not be
included as children of the c:sentence element because they are related
to different spans, but only as children of the respective result element.
- The third question I'd like to ask concerns the use of the fn:idref
function in XQuery. My first examples of the query used idref() to
select all those nodes underneath a:data that are related to a certain
span -- but I didn't manage to get any output although all XSD files are
available (I use Saxon-SA 9). What has to be changed in the XQuery to
use the idref function?
Again I apologize for asking three questions in my first post to the list.
Kind regards,
Maik Stührenberg