2
Motivation (1) XPath query: /s/r/*/it[mb/m/to=‘x’]//k Three navigation alternatives (among others): Straightforward navigation retrieve all it elements under /s/r/*/it; keep those having at least one to descendant under /mb/m/to with text value ‘x’. For the it elements left, return their k descendants. Starting from k return all k elements with at least one it ancestor, which in turn: has a to descendant under /mb/m/to with text value ‘x’ and has a s document element ancestor via relative path parent::*/parent::r/parent::s. Starting from to return all to elements under /s/r/*/it/mb/m/to, keep only those with text value ‘x’, then go backward via parent::m/parent::mb/parent::it and, for the it elements left, return their k descendants 2Athens University of Economics and Business

5
Contribution I GeCOEX: the first generic Xpath cost-based execution and optimization framework Agnostic to the underlying XML storage system and the access methods it supports Independent of the techniques and algorithms available for XPath processing. Encapsulated in operator implementations, and rewriting rules Cost based optimization 5Athens University of Economics and Business

10
XPalgebra – Sequence Operators Both the input and the output of a Sequence operator are sequences of nodes The input sequence is called context sequence BoolExpr: const | Ъ 1 ^Ъ 2 ^ … ^Ъ n, where Ъ i : Boolean Operator 10Athens University of Economics and Business

14
Physical Operators Athens University of Economics and Business14 Implements the Sequence interface of XPA API Access the XML data using the AccessMethods interface of the XPA API Example: a physical operator implementation That’s how physical operators are agnostic to the physical data model

17
Physical Plan /s/r/*/it[mb/m/to=‘x’]//k Use SM to find it elements under /s/r/*/it Filter it elements: For each it use LU to check whether it has to elements under mb/m/to For each to check if its text node equals ‘x’ For the remaining it elements find their k descendants using the Staircase Join Athens University of Economics and Business17

18
Costing Physical Operators The cost estimation of physical operator is defined by its Descriptor and is based on the cardinality of the input sequence/logical operator certain statistics (DBStatistics interface) the cost of the primitive access methods it invokes (AccMCostModels interface). Athens University of Economics and Business18 That’s how physical operator cost models are agnostic to the physical data model

19
Costing an Operator The cost of physical operator is based on the cardinality of the context sequence, certain statistics the cost of the primitive access methods it invokes. Example: Cost(d LU k ) = Card(f) *( c1 +Occ(/s/r/*/it,//k)* c2) where c1: the result of CostForDescLookup() c2: the result of CostForNextDesc() 19Athens University of Economics and Business

24
Four more XPA drivers The PE-Path XML storage system and driver Similar to PE-basic Distinct Root-To-Node-paths are stored in a separate index (RTN-paths index) getRTNPath() of the Element interface is very cheap due to RTN-paths index Parent() and Ancs() of the AccessMethods interface are very cheap due to the combination of dewey encoding and RTN-paths index The RE-basic XML storage system and driver Similar to PE-basic but uses region encoding The RE-Path XML storage system and driver Similar to the PE-Path but uses region encoding Parent() and Ancs() of the AccessMethods interface are not cheap The Edge-based RE-Path XML storage system and driver similar to RE-path but stores all elements in a single B-Tree structure 24Athens University of Economics and Business

26
Lookup Operators Novel efficient algorithms for holistically evaluating forward and backward multi-step paths Based on root-to-node filtering. buffered-leaping: a new technique for pipelined duplicate elimination and document order preservation Search a minimum window of elements for each element in the context sequence window: the result of calling the method from the AccessMethods interface of the XPA API (e.g. Descs(), Ancs()) corresponding to the XPath axis (e.g. descendant, ancestor) for a given context element

29
SM Operators Inspired by sort-merge join algorithms Traverse two sequences of elements, left and right left: the context sequence (the input sequence) right: always consists of all the elements of the requested tag name Keeping track of the current elements on left and right, try to find matching pairs according to the appropriate navigation axis and condition Novel techniques for holistic SM-based forward path and backward path operators with guaranteed low memory requirements

33
Conclusions I Novel techniques for evaluating forward and backward multi-step paths pipelined duplicate elimination and document order preservation Lookup fp, Lookup bp, Lookup cs, SM fp, SM bp, SM cs Fast backwards navigation that fully exploits the capabilities of the underlying storage system Algorithms perform well across a variety of different physical storage models First steps towards building cost models for XPath Athens University of Economics and Business33

34
Conclusions II Operator-based XPath processing provides significant optimization opportunities Different implementations of logical operators can provide benefits in different circumstances E.g. context selectivity Query plans can be much more efficient than (existing) monolithic (twig) techniques in most circumstances 34Athens University of Economics and Business