Abstract: Web services allow middleware access to a relational database and require data representation in XML format. The XML views obtained from relational databases can be accessed by usingXPath queries. This article proposes an optimization model for XML data processing based on a heuristic algorithm to extract data from XPath views. To this end, the author uses various XPath query classes temporarily stored in cache, as XPath views. For each view selected from cache, a compensation query can be found and composed with in order to solve an XML data query. Experimental results reveal the effectiveness of the heuristic method used to solve queries on XML documents.

The distributed systems use business object or XML web services to share data between applications, ensuring the platform and programming language independence. The Web services architecture involves the existence of a few layers, protocols and related technologies like XML, SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language).

Various XML data storage approaches for relational databases recommend the use of the generic relational structure, including XML document mapping. Kossman [1] has represented XML documents using graphs.

The ideas of memorizing information related to each node of the tree in an XML document has been developed by Yoshikawa, Amagasa and others [2]. The algorithm for XML data translation proposed by Yoshikawa and Amagasa is only appropriate for nonrecursive data and fails to obtain accurate results if the XML data has ascendants with the same label in the tree representation.

The relational model allows the elaboration of translation algorithms for the XQuery queries into SQL queries, where XQuery is a standard XML data interrogation language. For example, Oracle allows the creation of XML views for relational data, and the interrogation of XML views can use the XPath language.

Tudor proposes a cache pattern with multiqueries and describes the multi-query optimization with scheduling, caching and pipelining. A set of cache patterns is derived from a set of class of multiqueries that are loaded into the cache [5].

A semantic cache memorizing XML views can be used to optimize business objects. To avoid repeated connections to a backend database, the views stored in cache are interrogated. This type of middle-tier cache has become very popular for Web applications with relational databases. Semantic cache uses the views’ semantics to determine if the queries can be solved with the cache information entirely or partially [6], [7].

The contributions to this article can be summarized as follows:

a method to optimize access to XML data has been identified and it is based on the extraction of XPath views from a semantic cache.

a new solving technique has been proposed for the XPath queries. Thus, this paper offers:

definition for the XPath query classes for which the XPath view heuristic extraction algorithm is evaluated;

an effective method to select an XPath view from cache based on the constraint satisfaction verification for the XPath expressions;

the heuristic solving techniques for the queries described above have been implemented for relational Oracle databases. The complexity analysis stands as proof for the performance of the proposed algorithm. The time complexity will be evaluated according to the size of the input space represented by the set of XPath views. The experimental results prove the feasibility and effectiveness of the newly proposed heuristic algorithm.

The article is organized as follows. Section 2 describes the way to process XML queries and the conversion of XQuery queries in relational databases. Section 3 describes the issue of XML data rewriting using XPath views in Oracle databases. The author presents the composition of XPath queries and the creation of an XPath views’ cache. Section 4 describes the HSelectXP heuristic algorithm for special XPath query classes. Section 5 describes a complexity evaluation for the heuristic algorithm. Section 6 presents the experimental results, comparisons against other known algorithms and the way to process queries using the heuristic algorithm. The experimental study emphasizes the performance of the heuristic algorithm in processing XML query data. Section 7 draws-up the conclusions regarding the effectiveness of a semantic cache and the heuristic algorithm in optimizing access to XML data.