Nội dung Text: Database and XML Technologies- P2

40 C. Liu et al.
port. For this purpose, we will diﬀerentiate the single attribute primary/foreign
keys from multi-attribute primary/foreign keys while transforming the relational
database schema to XML schema. We also classify a relation into the following
four categories based on diﬀerent types of primary keys:
– regular : the primary key of a regular relation contains no foreign keys.
– component: the primary key of a component relation contains one foreign
key which references its parent relation. The other part of the primary key
serves as a local identiﬁer under the parent relation. A component relation
is used to represent a component or a multi-valued attribute of its parent
relation.
– supplementary: the primary key of a supplementary relation as a whole is
also a foreign key which references another relation. This relation is used
to supplement another relation or to represent a subclass for translating a
generalization hierarchy from a conceptual schema.
– association: the primary key of an association relation contains more than
one foreign keys, each of which references a participant relation.
Based on above discussion, we give the set of mapping rules.
3.1 Basic Mapping Rules
Given a relational database schema Sch with primary/foreign key deﬁnitions, we
may use the following basic mapping rules to convert Sch into a corresponding
XML schema Sch XML.
Rule 1 For a relational database schema Sch, a root element named Sch XML
is created in the corresponding XML schema as follows.
Rule 2 For each regular or association relation R, the following element with
the same name as the relation schema is created and then put under the root
element.

A Virtual XML Database Engine for Relational Databases 41
Rule 3 For each component relation R1 , let its parent relation be R2 , then an
element with the same name as the component relation is created and then placed
as a child element of R2 . The created element has the same structure as the
element created in Rule 2.
Rule 4 For each supplementary relation R1 , let the relation which R1 references
be R2 , then the following element with the same name as the supplementary
relation schema is created and then placed as a child element of R2 . The created
element has the same structure as the element created in Rule 2 except that the
maxOccurs is 1.
Rule 5 For each single attribute primary key with the name P KA of regular
relation R, an attribute of the element for R is created with ID data type as
follows.
Rule 6 For each multiple attribute primary key P K of a regular, a component
or an association relation R, suppose the key attributes are P KA1 , · · · , P KAn ,
an attribute of the element for R is created for each P KAi (1 ≤ i ≤ n) with
the corresponding data type. If R is a component relation and P KAi is a single
attribute foreign key contained in the primary key, then the data type of the
created attribute is IDREF. After that a key element is deﬁned with a selector
to select the element for R and several ﬁelds to identify P KA1 , · · · , P KAn . The
key element can be deﬁned inside or outside the element for R. The name of the
element P K should be unique within the namespace.
... ...
... ...
Rule 7 Ignore the mapping for primary key of each supplementary relation.
Rule 8 For each single attribute foreign key F KA of a relation R except one
which is contained in the primary key of a component or supplementary relation,
an attribute of the element for R is created with IDREF data type.

42 C. Liu et al.
Rule 9 For each multiple attribute foreign key F K of a relation R except one
which is contained in the primary key of a component or supplementary rela-
tion, suppose F K references P K of the referenced relation, and the foreign key
attributes are F KA1 , · · · , F KAn , an attribute of the element for R is created for
each F KAi (1 ≤ i ≤ n) with corresponding data type. Then a keyref element is
deﬁned with a selector to select the element for R and several ﬁelds to identify
F KA1 , · · · , F KAn . The keyref element can be deﬁned either inside or outside
the element. The name of the element F K should be unique within the names-
pace and refer of the element is the name of the key element of the primary key
which it references.
... ...
... ...
Rule 10 For each non-key attribute of a relation R, an element is created as a
child element of R. The name of the element is the same as the attribute name.
Rule 1 to Rule 10 are relatively straitforward for mapping a relational
database schema to a corresponding XML schema. One property of these rules is
redundancy free preservation, i.e., Rule 1 to Rule 10 do not introduce any data
redundancy provided the relational schema is redundancy free.
Theorem 3.1. If the relational database schema Sch is redundancy free, the
XML schema Sch XML generated by applying Rule 1 to Rule 10 is also redun-
dancy free.
This theorem is easy to prove. For a regular or an association relation R,
an element with the same name R is created under the root element, so the
relation R in Sch is isomorphically transformed to an element in Sch XML. For
a component relation R, a sub-element with the same name R is created under
its parent Rp . Because of the foreign key constraint, we have the functional
dependency P KR → P KRp , i.e., there is a many to one relationship from R to
Rp , therefore it is impossible that a tuple of R is placed more than one time under
diﬀerent element of Rp . Similar to a component relation, there is no redundancy
introduced for a supplementary relation.

A Virtual XML Database Engine for Relational Databases 43
3.2 An Example
Let us have a look of a relational database schema Company for a company.
Primary keys are underlined while foreign keys are in italic font.
Employee(eno, name, city, salary, dno)
Dept(dno, dname, mgrEno)
DeptLoc(dno, city)
Project(pno, pname, city, dno)
WorksOn(eno, pno, hours)
Given this schema as an input, the following XML schema will be generated:

44 C. Liu et al.
The root element Company XML is created for the relational database sche-
ma Company. Under the root element, four set elements Employee, Dept, Project
and WorksOn are created for relation schema Employee, Dept, Project and
WorksOn, respectively. For component relation schema DeptLoc, element Dept-
Loc is created under element Dept for its parent relation. PK/FK constraints in
the relational database schema Company have been mapped to the XML schema
Company XML by using ID/IDREF and KEY/FEYREF.
3.3 Exploring Nested Structures
As we can see, the basic mapping rules fail to explore all possible nested struc-
tures. For example, the Project element can be moved to under the Dept element
if every project belongs to a department. Nesting is important in XML schema
because it allows navigation of path expressions to be processed eﬃciently. If we
use IDREF instead, we may use system supported dereference function to get
the referenced elements. In XML, the dereference function is expensive because
ID and IDREF types are value based. If we use KEYREF, we have to put an ex-
plicit join condition in an XML query to get the referenced elements. Therefore,
we need to explore all possible nested structure by investigating the referential
integrity constraints in the relational schema. For this purpose, we introduce a
reference graph as follows:
Deﬁnition 3.1. Given a relational database schema Sch = {R1 , · · · , Rn }, a
reference graph of the schema Sch is deﬁned as a labeled directed graph RG =
(V, E, L) where V is a ﬁnite set of vertices representing relation schema R1 ,
· · · , Rn in Sch; E is a ﬁnite set of arcs, if there is a foreign key deﬁned in Ri

A Virtual XML Database Engine for Relational Databases 45
which references Rj , an arc e =< Ri , Rj >∈ E; L is a set of labels for edges
by applying a labeling function from E to the set of attribute names for foreign
keys.
Fig. 2. A Reference Graph
The reference graph of the relational schema Company is shown as in Fig-
ure 2. In the graph, the element of node DeptLoc has been put under the element
of node Dept by Rule 3. From the graph, we may have the following improve-
ment if certain conditions are satisﬁed.
(1) The element of node Project could be put under the element of node Dept if
the foreign key dno is deﬁned as NOT-NULL. This is because that node Project
only references node Dept and a many to one relationship from Project to Dept
can be derived from the foreign key constraint. In addition, the NOT-NULL
foreign key means every project has to belong one department. As a result, one
project can be put under one department and cannot be put twice under diﬀer-
ent departments in the XML document.
(2) A loop exists between Employee and Dept. What we can get from this is
a many to many relationship between Employee and Dept. In fact, the foreign
key mgrEno of Dept reﬂects a one to one relationship from Dept to Employee.
Fortunately, this semantics can be captured by checking the unique constraint
deﬁned for the foreign key mgrno. If there is such a unique constraint deﬁned,
the foreign key mgrEno of Dept really suggests a one to one relationship from
Dept to Employee. For the purpose of nesting, we delete the arc from Dept to
Employee labelled mgrno from the reference graph. The real relationship from
Employee to Dept is many to one. As such, the element of the node Employee
can also be put under the element of the node Dept if the foreign key dno is
deﬁned to NOT-NULL.
(3) The node WorksOn references two nodes Employee and Project. The element
of WorksOn can be put under either Employee and Project if the corresponding
foreign key is NOT-NULL. However, which node to choose to put under all de-

46 C. Liu et al.
pends on which path will be used often in queries. We may leave this decision
to be chosen by a designer.
Based on the above discussion, we can improve the basic mapping rules by
the following theorems.
Theorem 3.2. In a reference graph RG, if a node n1 for relation R1 has only
one outcoming arc to another node n2 for relation R2 and foreign key denoted
by the label of the arc is deﬁned as NOT-NULL and there is no loop between
n1 and n2 , then we can move the element for R1 to under the element for R2
without introducing data redundancy.
The proof of this theorem has already explained by the relationships between
Project and Dept, and between Dept and Employee in Figure 2. The only arc
from n1 to n2 and there is no loop between the two nodes represents a many
to one relationship from R1 to R2 , while the NOT-NULL foreign key gives a
many to exact one relationship from R1 to R2 . Therefore, for each instance of
R1 , it is put only once under exactly one instance of R2 , no redundancy will be
introduced.
Similarly, we can have the following.
Theorem 3.3. In a reference graph RG, if a node n0 for relation R0 has out-
coming arcs to other nodes n1 , · · · , nk for relations R1 , · · · , Rk , respectively, and
the foreign key denoted by the label of at least one such outcoming arcs is deﬁned
as NOT-NULL and there is no loop between n0 and any of n1 , · · · , nk , then we
can move the element for R0 to under the element for Ri (1 ≤ i ≤ k) without
introducing data redundancy provided the foreign key deﬁned on the label of the
arc from n0 to ni is NOT-NULL.
Rule 11 If there is only one many to one relationship from relation R1 to an-
other relation R2 and the foreign key of R1 to R2 is deﬁned as NOT-NULL, then
we can move the element for R1 to under the element for R2 as a child element.
Rule 12 If there are more than one many to one relationship from relation R0
to other relations R1 , · · · , Rk , then we can move the element for R0 to under the
element for Ri (1 ≤ i ≤ k) as a child element provided the foreign key of R0 to
Rk is deﬁned as NOT-NULL.
By many to one relationship from relation R1 to R2 , we mean that there is
one arc which cannot be deleted from node n1 for R1 to node n2 for R2 , and
there is no loop between n1 and n2 in the reference graph. If we apply Rule 11
to the transformed XML schema Company XML, the elements for Project and
Employee will be moved to under Dept as follows, the attribute dno with IDREF
type can be removed from both Project and Employee elements.

A Virtual XML Database Engine for Relational Databases 47
XML Schema oﬀers great ﬂexibility in modeling documents. Therefore, there
exist many ways to map a relational database schema into a schema in XML
Schema. For examples, XViews [2] constructs graph based on PK/FK relation-
ship and generate candidate views by choosing node with either maximum in-
degree or zero in-degree as root element. The candidate XML views generated
achieve high level of nesting but suﬀer considerable level of data redundancy.
NeT [8] derives nested structures from ﬂat relations by repeatedly applying nest
operator on tuples of each relation. The resulting nested structures may be use-
less because the derivation is not at the type level. Compared with XViews and
NeT, our mapping rules can achieve high level of nesting for the translated XML
schema while introducing no data redundancy provided the underlying relational
schema is redundancy free.

48 C. Liu et al.
4 Query Translation
In this section, we discuss how XQuery queries are translated to corresponding
SQL queries. SQL is used to express queries on ﬂat relations, where a join op-
eration may be used frequently to join relations together; while XQuery is used
to express queries on elements which could be highly nested by sub-elements or
linked by IDREF, where navigation via path expression is the main means to
link elements of a document together. As XQuery is more powerful and ﬂexible
than SQL, it is hard to translate an arbitrary XQuery query to correspond-
ing SQL query. Fortunately, in VXE-R, the XML schema is generated from the
underlying relational database schema, therefore, the structure of the mapped
XML elements is normalized. Given the mapping rules introduced in Section 3,
we know the reverse mapping which is crucial for translating queries in XQuery
to the corresponding queries in SQL.
As XQuery is still in its draft version, in this paper, we only consider the
translation of basic XQuery queries which do not include aggregate functions.
The main structure of an XQuery query can be formulated by an FLWOR expres-
sion with the help of XPath expressions. An FLWOR expression is constructed
from FOR, LET, WHERE, ORDER BY, and RETURN clauses. FOR and LET
clauses serve to bind values to one or more variables using (path) expressions.
The FOR clause is used for iteration, with each variable in FOR iterates over the
nodes returned by its respective expression; while the optional LET clause binds
a variable to an expression without iteration, resulting in a single binding for each
variable. As the LET clause is usually used to process grouping and aggregate
functions, the processing of the LET clause is not discussed here. The optional
WHERE clause speciﬁes one or more conditions to restrict the binding-tuples
generated by FOR and LET clauses. The RETURN clause is used to specify an
element structure and to construct the result elements in the speciﬁed structure.
The optional ORDER BY clause determines the order of the result elements.
A basic XQuery query can be formulated with a simpliﬁed FLWOR expres-
sion:
FOR x1 in p1, ..., xn in pn
WHERE c
RETURN s
In the FOR clause, iteration variables x1 , · · · , xn are deﬁned over the path
expressions p1 , · · · , pn . In the WHERE clause, the expression c speciﬁes con-
ditions for qualiﬁed binding-tuples generated by the iteration variables. Some
conditions may be included in pi to select tuples iterated by the variable xi .
In the RETURN clause, the return structure is speciﬁed by the expression s.
A nested FLWOR expression can be included in s to specify a subquery over
sub-elements.
4.1 The Algorithm
Input. A basic XQuery query Qxquery against an XML schema Sch XML which
is generated from the underlying relational schema Sch.

A Virtual XML Database Engine for Relational Databases 49
Output. A corresponding SQL query Qsql against the relational schema Sch.
Step 1: make Qxquery canonical - Let pi deﬁned in the FOR clause be the form
of /stepi1 / · · · /stepik . We check whether there is a test condition, say cij in
stepij of pi from left to right. If there is such a step, let stepij be the form of
lij [cij ], then we add an extra iteration variable yij in the FOR clause which is
deﬁned over the path expression /li1 / · · · /lij , and move the condition cij to the
WHERE clause, each element or attribute in cij is preﬁxed with $yij /.
Step 2: identify all relations - After Step 1, each pi in the FOR clause is now
in the form of /li1 / · · · /lik , where lij (1 ≤ j ≤ k) is an element in Sch XML.
Usually pi corresponds to a relation in Sch (lik matches the name of a relation
schema in Sch). The matched relation name lik is put in the FROM clause of
Qsql followed by the iteration variable xi served as a tuple variable for relation
lik . If there is an iteration variable, say xj , appears in pi , replace the occurrence
of xj with pj . Once both relations, say Ri and Rj , represented by pi and pj
respectively are identiﬁed, a link from Ri to Rj is added in a temporary list
LINK. If there are nested FLWOR expressions deﬁned in RETURN clause, the
relation identiﬁcation process is applied recursively to the FOR clause of the
nested FLWOR expressions.
Step 3: identify all target attributes for each identiﬁed relation - All target at-
tributes of Qsql appear in the RETURN clause. For each leaf element (in the
form of $xi /t) or attribute (in the form of $xi /@t) deﬁned in s of the RETURN
clause, replace it with a relation attribute in the form of xi .t. Each identiﬁed
target attribute is put in the SELECT clause of Qsql . If there are nested FLWOR
expressions deﬁned in RETURN clause, the target attribute identiﬁcation pro-
cess is applied recursively to the RETURN clause of the nested FLWOR expres-
sions.
Step 4: identify conditions - Replace each element (in the form of $xi /t) or
attribute (in the form of $xi /@t) in the WHERE clause of Qxquery , then move
all conditions to the WHERE clause of Qsql with a relation attribute in the form
of xi .t. If there are nested FLWOR expressions deﬁned in RETURN clause, the
condition identiﬁcation process is applied recursively to the WHERE clause of
the nested FLWOR expressions.
Step 5: set the links between iteration variables - If there is any link put in the
temporary list LINK, then for each link from Ri to Rj , create a join condition
between the foreign key attributes of Ri and the corresponding primary key
attributes of Rj and ANDed to the other conditions of the WHERE clause of
Qsql .
4.2 An Example
Suppose we want to ﬁnd all departments which have oﬃce in Adelaide and we
want to list the name of those departments as well as the name and salary of
all employees who live in Adelaide and work in those departments. The XQuery
query for this request can be formulated as follows:
FOR $d in /Dept, $e in $d/Employee, $l in $d/DeptLoc
WHERE $l/city = "Adelaide" and

50 C. Liu et al.
$e/city = "Adelaide" and
$e/@dno = $d/@dno
RETURN
$d/dname
$e/name
$e/salary
Given this query as an input, the following SQL query will be generated:
SELECT d.dname, e.name, e.salary
FROM Dept d, Employee e, DeptLoc l
WHERE l.city = "Adelaide" and
e.city = "Adelaide" and
e.dno = d.dno and
l.dno = d.dno
5 XML Documents Generation
As seen from the query translation algorithm and example introduced in the
previous section, the translated SQL query takes all leaf elements or attributes
deﬁned in an XQuery query RETURN clause and output them in a ﬂat relation.
However, users may require a nested result structure such as the RETURN
structure deﬁned in the example XQuery query. Therefore, when we generate
the XML result documents from the translated SQL query result relations, we
need to restructure the ﬂat result relation by a grouping operator [9] or a nest
operator for N F 2 relations, then convert it into XML documents.
Similar to SQL GROUP BY clause, the grouping operator divides a set or
list of tuples into groups according to key attributes. For instance, suppose the
translated SQL query generated from the example XQuery query returns the
following result relation as shown in Table 1. After we apply grouping on the
relation using dname as the key, we have the nested relation as shown in Table
2 which can be easily converted to the result XML document as speciﬁed in the
example XQuery query.
Table 1. Flat Relation Example Table 2. Nested Relation Example
dname name salary dname name salary
development Smith, John 70,000 development Smith, John 70,000
marketing Mason, Lisa 60,000 Leung, Mary 50,000
development Leung, Mary 50,000 Chen, Helen 70,000
marketing Lee, Robert 80,000 marketing Mason, Lisa 60,000
development Chen, Helen 70,000 Lee, Robert 80,000

Cursor Management for XML Data 53
papers. The user may want to browse the table of content or abstract before retrieving
the rest of the paper. Always materializing a whole paper is not needed and is a waste
of resource. Therefore, a mechanism that provides materialization on-demand is
highly desirable. The ROLEX system [3] provides such a solution by applying a vir-
tual DOM interface. ROLEX works fine when applications are running in the same
memory space as the ROLEX system. In a multi-tier application environment where
application programs and the data set reside in different computer systems, it will not
function as well. In addition, result navigation in ROLEX is specified by a declarative
view query, which is not as flexible or powerful as a full-fledged cursor interface.
In this paper, we propose a mechanism for efficient and flexible cursor operations
on XML data sets stored in an XML repository based on a relational database system.
In particular, we make the following three contributions:
1 We propose a cursor definition for XML data or query result that includes a set
of cursor operations for traversing and fetching XML data. The cursor interface
allows an application to step through the XML query result in different units of
granularities such as a node, a sub-tree or a whole document. Our cursor defini-
tion also supports a positioned update function. Due to space limitation, the up-
date function will not be covered in this paper.
2 We design and study several alternatives for supporting XML cursors in an
XML repository based on a relational database system. Our design exploits
technologies from [10][12][13][15] whenever applicable and focuses on meeting
the challenges of providing an efficient mechanism for cursor and result naviga-
tion.
3 We then provide a performance analysis of three different implementation ap-
proaches: multi-cursor, outer union, and hybrid.
The rest of this paper is organized as follows. Section 2 discusses related work.
Section 3 presents the details of the cursor definition. Section 4 describes the various
techniques for processing XML cursor statements. It focuses on algorithms of query
processing and result navigation. Performance comparisons of the various approaches
are presented in Section 5. Finally, Section 6 concludes the paper and describes direc-
tions for future work.
2 Related Work
There are two aspects of our XML cursor management work, namely a flexible cursor
interface definition for navigating XML data, and a robust implementation of such an
interface.
In the area of cursor interface definitions, cursors are defined in SQL [4] and JDBC
[6] for relational data. Such cursor interfaces are inadequate for traversing or fetching
multi-dimensional XML query results. DOM [17] defines a general interface to ma-
nipulate XML documents. However, it does not address the aspect of persistent stor-
age. In addition, our SQL/JDBC-like XML cursor interface is preferable since the
relational database systems are the dominant data management systems currently and,
in most cases, XML-based applications will need to interoperate with existing SQL
centric applications. Oracle’s JXQI [9], the Java XQuery API, also defines a
SQL/JDBC-like cursor interface but with functionalities limited to one dimension
cursor movement.

54 N. Li et al.
In order to provide cursor management for XML data on top of a relational data-
base system, the system first requires the basic XML data management capabilities
such as storage and query. Several approaches have been proposed and published
[2][5][7][10][13] in this area. The work described in [11][12], on the other hand,
studied how to query XML views of relational data and to publish relational data as
XML documents. In addition, [16] studied storing and querying ordered XML data in
a relational database system while [15] proposed a method for performing searched
update of XML data. None of these works addresses XML cursor support and, to the
best of our knowledge, no previous work has addressed navigation and update of
persistent XML data through SQL-like cursors.
3 XML Cursor Interface
We propose a cursor interface definition that has similar syntax as the cursor defini-
tions of embedded SQL but with much richer functionalities. Our proposal takes into
consideration the multi-dimensional nature of the XML data model, which is much
richer than the relational data model. In the following, we first present the data model
and then give a detailed description of the proposed cursor definition.
3.1 Data Model
Our XML cursor definition is targeted for traversing the result set of an XQuery query
expression, thus the data model that we adopt is similar to the XQuery data model but
simpler. The XQuery and XPath Data Model [22] views XML documents as trees of
nodes. An XQuery query result is an ordered sequence of zero or more nodes or
atomic values. Because each node in the sequence represents the root of a tree in the
node hierarchy, the result could be treated as a result tree sequence. For simplicity, we
assume that the root nodes are all element nodes. We further simplify the model by
attaching attributes and text to their element nodes, which means there will be no
separate nodes for attributes or text. Consequently, attributes and text of the current
element node can be retrieved without moving the cursor to a different node.
3.2 Cursor Definition
In the following, we first define the XML cursor interface, as an extension of XQuery,
which includes cursor declaration, open and close of a cursor, cursor navigation, and
result retrieval via a cursor. We then discuss the cursor position for various cursor
movement operations.
Declare a Cursor
DECLARE CURSOR [] [SCROLL]
FOR FOR []
: INSENSITIVE | SENSITIVE
: READ ONLY | UPDATE
: OPTIMIZE FOR MAX DEPTH

Cursor Management for XML Data 55
The above statement defines a cursor and its properties. specifies
the XQuery query expression whose result the cursor is bound to. has
similar semantics as that of a SQL cursor. When SCROLL is not specified, the cursor
movement operation is limited to only NextNode. All supported cursor movement
operations are described in later in this section. UPDATE can be specified only if
each node of the XML query result has a one-to-one mapping to the backend persis-
tent storage. lets users provide hints of the usage pattern for better
performance. A useful hint is the maximum depth of the result trees that the applica-
tion will access.
Open and close the Cursor
OPEN CURSOR
CLOSE CURSOR
The open statement executes the XQuery query specified in the cursor’s
DECLARE statement. The result set will be identified, but may or may not be materi-
alized. The close statement releases any resource associated with the cursor.
Fetch the Next Unit of Data
FETCH FROM CURSOR
[ ][ ]
INTO (STRING | DOM )
: NextTree | PreviousTree
| NextNode [EXCLUDING Descendant]
| PreviousNode [EXCLUDING Ancestor]
| ChildNode | ParentNode | NextSibling
| PreviousSibling
: INCLUDING SUBTREE [OF MAX DEPTH ]
: WITH (TEXT | ATTRIBUTES) ONLY
This statement allows a cursor to move to a desired position from its current posi-
tion and to specify the content to be retrieved into a host application.
specifies the types of cursor movement and there are two of them. The first type is
sequential, which follows the document order. NextNode and PreviousNode fall into
this category. The other type is structure-aware, which includes the rest of the opera-
tions. The destination position is determined by the tree structures of the XML result.
The structure-awareness of the cursor movement matches the multi-dimensional na-
ture of XML data and is very important for many applications to step through the
XQuery query result. With the supported , NextTree/PreviousTree
moves the cursor to the root node of the next/previous tree in the result tree sequence
while NextNode/PreviousNode moves to the next/previous node in the current tree in
the document order. If the cursor is not scrollable, then only NextNode is enabled and
no node in the result sequence can be skipped.
specifies the unit of the data to be fetched. It could be the current
node, the sub-tree which rooted at the current node, or such a sub-tree but only re-
trieving nodes up to depth . Because attributes and text are attached to their ele-
ment node in our simplified data model, allows user to specify
whether to retrieve the text only, the attributes only, or the element node with both
text and attributes. Also, the data can be fetched either as a string or in DOM repre-
sentation.

56 N. Li et al.
Position the Cursor
SAVE CURRENT POSITION OF CURSOR IN
SET CURRENT POSITION OF CURSOR TO
These statements are used to bookmark (save) a given cursor position or to set the
cursor to a previously saved position. This enable an application to remember some or
all visited nodes and later jump back to any of the saved node, if needed.
3.3 Cursor Position
When a cursor is first opened, it is placed right before the root node of the first tree in
the result sequence. A NextNode or a NextTree operation will bring the cursor to the
root node of the first tree. After that, a cursor position is always on a node. In case of
a DELETE operation, the cursor will be positioned on the prior node in the document
order that is not deleted.
4 Query Processing & Navigation Techniques
In this section, we describe three approaches to support the cursor interface for XML
data stored in relational databases. The major technical challenges in supporting cur-
sor operations in XQuery are: (1) how to translate an XQuery into one or more SQL
queries that will facilitate result navigation and retrieval, and (2) how to implement
navigation capabilities given the SQL queries generated.
Before diving into the details of the approaches, we first describe the mapping of
XML data into relational tables in our system. For simplicity, we use a direct map-
ping, which maps each element to a relational table with the following columns:
 one column for each attribute of the element
 a column to store the element content if content is allowed
 two columns, id and parentid, which capture the parent-child relationship of
the element hierarchy; the parentid column is the foreign key of the id column
of the element’s parent table
 a column, docid, which stores the id of the document the element belongs to
A similar method is described in [13]. As pointed out there, a direct mapping may
lead to excessive fragmentation. Various inlining and outlining techniques are de-
scribed in [13] to reduce fragmentation and the number of SQL joins needed to evalu-
ate path expressions.
The mapping scheme used will determine the top level SQL query that is generated
in all of our cursor implementation approaches. Also, the number of JDBC cursor
movements may be different for different mapping schemes, e.g. if the inlining tech-
nique is used, no extra SQL cursor movement is required to move to the next child
element. On the other hand, if the outlining technique is used, it would require another
table join to get all the parts of the same element. Since the comparison of different
mapping schemes is not a goal of our work, we have chosen a single mapping scheme
(i.e. the direct mapping scheme) to carry out the performance comparison of our cur-
sor implementation approaches. The cursor implementation approaches themselves,
are general and the relative performance among them will not change if other map-
ping schemes are used.

Cursor Management for XML Data 57
Several methods to support the ordered XML data model in the unordered rela-
tional model are proposed in [16]. Among the three schemes suggested, we choose the
global order encoding scheme rather than the Dewey order encoding scheme mainly
because the global order encoding method provides the best query performance. In
addition, we accommodate its weakness for update by leaving gaps in the order num-
bers [8] to reduce the frequency of renumbering. When renumbering occurs, in most
cases, only a small number of nodes are required to be touched. On the other hand,
Dewey order encoding scheme always requires the whole subtree to be renumbered.
Figure 1shows an example of such case. An extra column, “iorder” is used to record
the global element order within each XML document.
Fig. 1. A renumbering example Fig. 2. Graph Representation of an XML Schema
In the following we present the query processing and the result navigation algo-
rithms for three approaches in detail by means of concrete examples.
4.1 The Multi-cursor Approach
A straightforward approach is to open a series of database cursors on the relations that
map to the result elements. We term this the multi-cursor (MC) approach.
4.1.1 Query Translation
This approach translates an XQuery query into a set of top-level SQL queries and
constructs a list of parameterized SQL queries, one for each relation an element type
in the result mapped to. An XQuery can be translated into more than one top-level
SQL queries because an XQuery can construct results in different types. A query like
“/Book/*” is such an example. So there will be multiple top-level queries, one for
each child of the Book element. Each top-level query, when executed, produces a set
of relational tuples as the root nodes of the result trees. Each of the parameterized
queries, when executed with an id of a node in the result, produces the node’s child
nodes of one element type in the relational form. As an example, consider the graph
representation of an XML schema shown in Figure 2. The XQuery query in Figure 3
will be translated into the top-level query and the list of parameterized queries shown
in the same figure. In this case, all the translated queries are sorted according to the
document order. Because the XML data model does not preserve order across docu-
ments, the sort is not required for the top-level queries if the query results are in the
document node level.

58 N. Li et al.
For a given XML query, the query processor generates queries that select all the
attributes of the elements as well as the id’s needed for result navigation.
User Query: /Journal[./Editor/@name="James Bond"]/Article
Translated Queries:
1. SELECT DISTINCT Article.docid, Article.id,
Article.title, Article.page
FROM Journal, Editor, Article
WHERE (Journal.id = Editor.parentid)
AND (Editor.name = "James Bond")
AND (Journal.id = Article.parentid)
ORDER BY Article.docid, Article.iorder
2. SELECT id, name, affiliation
FROM Author
WHERE parentid = ? and docid = ?
ORDER BY iorder
3. SELECT id, number, title
FROM Section
WHERE parentid = ? and docid = ?
ORDER BY iorder
4. SELECT id, number, note
FROM Paragraph
WHERE parentid = ? and docid = ?
ORDER BY iorder
Fig. 3. XQuery Query Example
4.1.2 Result Navigation
Given all the queries generated in the query translation phase, multiple cursors are
opened by executing these queries to implement the navigation functionalities. Ini-
tially, the top-level queries are executed and their SQL result sets are pushed into a
stack; and the rest of the parameterized queries are prepared. As the XML cursor
moves to different nodes in the result, prepared queries corresponding to the current
navigation level are executed and SQL result sets are pushed into the stack if the
navigation is going down the result tree or popped out of the stack if going up the
result tree. To make sure the proper order of the node is returned, firstly each query
result has to be sorted according to the document order. Then among the current
tuples of the current SQL result sets, we return the one with the minimum order value
if the operation is a forward traversal. This also applies to the top-level queries to
determine which tuple to be returned if there are multiple top-level queries and the top
result nodes are not document root nodes. At any time, the tuples of the result sets on
top of the stack corresponds to the current cursor level, and all its ancestors are on the
stack, in sequence. The Appendix contains an example explaining the mechanism in
detail.
4.2 The Outer Union Approach
In the multi-cursor (MC) approach, multiple SQL cursors are opened for an XML
cursor. When an application moves an XML cursor to an element (table) of a deeper
level in the result set, the corresponding set of queries is executed and the same num-

Cursor Management for XML Data 59
ber of SQL cursors is opened. This consumes a lot of database resources and adds
extra query processing time during navigation. The outer union (OU) approach re-
duces the number of SQL cursors opened by opening a single SQL cursor for each
XML cursor. It adopts a modified version of the “outer union” method, first proposed
in [12], in the query processing phase to return an entire XML result set.
4.2.1 Query Translation
Among the many varieties proposed in [12], the Sorted Outer Union method is chosen
because it structures relational tuples in the same order needed to appear in the XML
result. Particularly, a tuple represents one XML node in the result if the Node Sorted
Outer Union is used. However, the method was presented in the context of publishing
relational data in XML form, which cannot be applied directly to support our cursor
interface. It needs to be extended to incorporate the processing of XQuery in order to
return the result of the query. Moreover, instead of sorted by the ID columns, the
translated query is sorted by the document id plus two order columns. The first order
column is the global order column which captures the order of the result set sequence.
All the descendant elements share the same global order of their root nodes. The sec-
ond one is the local order column which records the order of all the descendants of an
XML node in the result sequence. An example of a translated outer union query for
the same XQuery in Section 4.1.1 is given in the Appendix.
As expected, the result of this approach is the best for sequential cursor movements
such as NextNode and PreviousNode. However, for operations like, NextSibling, a
number of tuples need to be examined before the destination is reached. This may
result in significant overhead in cursor movements. If the database system has a way,
using the distance information, to jump directly to the destination node, it will avoid
the need of fetching continuously until the destination node is reached. Thus a
mechanism to compute the distance as part of the query translation would be very
helpful. We use the DB2 OLAP partition function [14] to calculate the distance in-
formation in the query. An example using the partition function is also provided in the
Appendix.
4.2.2 Result Navigation
With the XML result as relational tuples generated by executing the outer union query
described above, supporting the proposed navigation functionalities is quite straight-
forward. A sequential movement to the next or previous XML node corresponds to
one next or previous operation of the database cursor; on the other hand, multiple
database cursor operations are needed for a non-sequential movement of an XML
cursor unless the distance information is available.
Figure 4 shows the pseudo code for result navigation with the OU approach. Ver-
sion 1 of the NextSibling does not use the distance information. The second version
uses the distance information and “relativeNode()”, which is a helper function that
utilizes the “relative” method of SQL cursor Also, fCurrentNode is the current XML
node constructed from the current tuple immediately before it is used while fSaved-
Node is the XML node constructed from the then current tuple at the very beginning
of the axis operation Functions like “getDescendantCount()” are used to retrieve the
distance information when available.