The amount of semistructured data is growing rapidly as the World Wide Web has
developed into a central means for sharing and disseminating information. The
structure of tree-like semistructured data is not rigid. The most common instance
of this type of data is XML. Applications endeavouring to access components of
semistructured data are naturally inclined towards a recursive approach to navigate
data on trees. However, conventional wisdom indicates that a set-oriented mechanism
is necessary for database management systems to obtain good performance in
the presence of large amounts of data.
Our main objective in this thesis is to develop a set-oriented query execution scheme
for XML data. We propose a system, called "Equate" (Execution of Queries Using
an Automata Theoretic Engine), which intelligently utilises an automata rewriting
scheme to transform a query language into an internal query plan with relational-like
operators scheduled in a single process for a set-oriented execution.
Our approach contains two phases. The first phase, set-oriented execution, performs
queries on edges and binds any variables required. The second phase, reachability
analysis, refines the result, filtering out any false matches, and collects sets of variable
bindings into a final result structure.
" A novel aspect of our approach is that our set-oriented execution, even for complex
queries, requires only variants of the relational select, project, and union operators,
but no joins.