Share Evaluating the Dynamic Behavior of Database Applications

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

Evaluating the Dynamic Behavior of Database Applications Zhen He 1 and Jérôme Darmont 2 1 Department of Computer Science University of Vermont Burlington, VT USA Tel: Fax: ERIC, Université Lumière Lyon 2 5 avenue Pierre Mendès-France Bron Cedex France Tel: Fax: Submission Type: Research Paper Evaluating the Dynamic Behavior of Database Applications Abstract This paper explores the effect that changing access patterns has on the performance of database management systems. Changes in access patterns play an important role in determining the efficiency of key performance optimization techniques, such as dynamic clustering, prefetching, and buffer replacement. However, all existing benchmarks or evaluation frameworks produce static access patterns in which objects are always accessed in the same order repeatedly. Hence, we have proposed the Dynamic Evaluation Framework (DEF) that simulates access pattern changes using configurable styles of change. DEF has been designed to be open and fully extensible (e.g., new access pattern change models can be added easily). In this paper, we instantiate DEF into the Dynamic object Evaluation Framework (DoEF) which is designed for object databases, i.e., object-oriented or object-relational databases such as multi-media databases or most XML databases. The capabilities of DoEF have been evaluated by simulating the execution of four different dynamic clustering algorithms. The results confirm our analysis that flexible conservative re-clustering is the key in determining a clustering algorithm s ability to adapt to changes in access pattern. These results show the effectiveness of DoEF at determining the adaptability of each dynamic clustering algorithm to changes in access pattern in a simulation environment. In a second set of experiments, we have used DoEF to compare the performance of two real-life object stores : Platypus and SHORE. DoEF has helped to reveal the poor swapping performance of Platypus. Keywords: Performance evaluation, Dynamic access patterns, Benchmarking, Object-oriented and object-relational databases, Clustering. Introduction Performance evaluation is critical for both designers of Database Management Systems (DBMSs), for architectural or optimisation choices, and users, for efficiency comparison or tuning. Traditionally, this is achieved with the use of benchmarks, i.e., synthetic workload models (databases and operations) and sets of performance metrics. Although in real life, almost no application always accesses the same data in the same order repeatedly, none of the existing database benchmarks incorporate the possibility of change in the access patterns. The ability to adapt to changes in access patterns is critical to database performance. In addition, highly tuning a database to perform well for only one particular access pattern can lead to poor performance when different access patterns are used. Thus, a database tuned to a particular trace (a particular instance of a real application usage) is likely to perform poorly when a different trace is used. In addition, the performance of a database on a particular trace provides little insight into the reasons behind its performance, and thus is of limited use to database researchers or engineers, who are interested in the identification and improvement in the performance of particular components of the system. Thus, the aim of our work is to provide a means for them to explore the performance of databases under different styles of access pattern change. In contrast, benchmarks of the TPC family aim to provide standardised means of comparing systems for vendors and customers. In this paper, we take a first look at how dynamic application behavior can be modeled and propose the Dynamic Evaluation Framework (DEF). DEF makes the first attempt at exploring the issue of evaluating the performance of DBMSs in general and such optimization techniques as dynamic clustering algorithms in particular, with respect to changing query profiles. DEF contains a set of protocols which in turn define a set of styles of access pattern change. DEF by no means has exhausted all possible styles of access pattern change. However, we have designed DEF to be fully extensible and its design allows new styles of change to be easily incorporated. Finally, DEF is a generic platform that can be specialized to suit the particular needs of a given family of DBMS (e.g., relational, object, or objectrelational). In particular, it is designed to be implemented on top of an existing benchmark so that previous benchmarking research and standards can be reused. In this paper, we show the utility of DEF by creating an instance of DEF called the Dynamic object Evaluation Framework (DoEF) (He and Darmont, 2003) which is designed for object databases. Note that, in the remainder of this paper, we term Object Database Management Systems (ODBMSs) both object-oriented and object-relational systems, indifferently. ODBMSs include most multimedia and XML DBMSs, for example. DoEF is built on top of the Object Clustering Benchmark (OCB) (Darmont, Petit, and Schneider, 1998; Darmont and Schneider, 2000), which is a generic object-oriented benchmark that is able to simulate the behavior of other main object-oriented benchmarks. DoEF uses both the database built from the rich schema of OCB and the operations offered by OCB. Since OCB s generic model can be implemented within an object-relational system and most of its operations are relevant for such a system, DoEF can also be used in the objectrelational context. To test the effectiveness of DoEF, we have conducted two sets of experiments. First we have benchmarked four state of the art dynamic clustering algorithms (Bullat and Schneider, 1996; Darmont, Fromantin, Regnier, Gruenwald, and Schneider, 2000; He, Marquez, and Blackburn, 2000). There are three reasons for choosing to test the effectiveness of DoEF using dynamic clustering algorithms: (1) ever since the early days of object database management systems, clustering has been proven to be one of the most effective performance enhancement techniques (Gerlhof, Kemper, and Moerkotte, 1996); (2) the performance of dynamic clustering algorithms is very sensitive to changing access patterns; and (3) despite this sensitivity, no previous attempt has been made to benchmark these algorithms in this way. Then we tested the utility of DoEF by benchmarking two transactional object stores: Platypus (He, Blackburn, Kirby, and Zigman, 2000); and SHORE (Carey, DeWitt, Franklin, Hall, McAuliffe, Naughton, Schuh, Solomon, Tan, Tsatalos, White, and Zwilling, 1994). Our first paper about DoEF (He and Darmont, 2003) made two key contributions: (1) it proposed the first evaluation framework that allowed ODBMSs and associated optimisation techniques to be evaluated in a dynamic environment; (2) it presented the first performance evaluation experiments of dynamic clustering algorithms in a dynamic environment (by simulation). This paper expands on this material by presenting a more generic view of our evaluation framework, by providing a more thorough description of the configurable styles of change, and by reporting the results of new experiments that validate the effectiveness of DoEF at contrasting the dynamic performance of two real-life ODBMSs. The remainder of this paper is organised as follows. We first present a brief description of existing DBMS benchmarks. Second we present an overview of the OCB benchmark. The next two sections describe in detail the DEF framework and its object-oriented instance DoEF, respectively. Next we presents a brief description of the state of the art clustering algorithms and object stores we have used in this paper. We present and discuss experimental results achieved with DoEF in the next section, and finally conclude the paper and provide future research directions. Existing Benchmarks We briefly describe here the prevalent benchmarks, besides OCB that is detailed in the next section, which have been proposed in the literature for evaluating the performances of DBMSs. Note that none of these benchmarks incorporate any dynamic application behavior. In the world of relational databases, the Transaction Processing Performance Council (TPC), a non-profit institute founded in 1988, defines standard benchmarks, verifies their correct application, and publishes the results. The TPC benchmarks include TPC-C (TPC, 2002a) for OLTP, TPC-H (TPC, 2003a) and TPC-R (TPC, 2003b) for decision support, and TPC-W (TPC, 2002b) for web commerce. All these benchmarks feature an elaborate database and set of operations. Both are fixed, the only parameter being the database size (scale factor). In contrast, there is no standard object-oriented database benchmark. However, the OO1 benchmark (Cattell, 1991), HyperModel benchmark (Anderson, Berre, Mallison, Porter, and Scheider, 1990), and the OO7 benchmark (Carey, DeWitt, and Naughton, 1993) may be considered as de facto standards. They are all designed to mimic engineering applications such as CAD, CAM, or CASE applications. They range from OO1, that has a very simple schema (two classes) and only three simple operations, to OO7, that is more generic and provides both a much richer and more customisable schema (ten classes), and a wider range of operations (fifteen complex operations). However, even OO7 s schema is static and still not generic enough to model other types of applications like financial, telecommunications and multimedia applications (Tiwary, Narasayya, and Levy, 1995). Furthermore, each step in adding complexity makes these benchmarks harder to implement. Object-relational benchmarks, such as the BUCKY benchmark (Carey, DeWitt, Naughton, Asgarian, Brown, Gehrke, and Shah, 1997) and Benchmark for Object-Relational Databases (BORD) (Lee, Kim, and Kim, 2000), are query-oriented benchmarks that are specifically aimed at evaluating the performances of object-relational database systems. For instance, BUCKY only features operations that are specific to object-relational systems, since typical object navigation has already been tested by other benchmarks (see above). Hence, these benchmarks focus on queries involving object identifiers, inheritance, joins, class references, inter-object references, setvalued attributes, flattening queries, object methods, and various abstract data types. The database schema is also static in these benchmarks. Finally, Carey and Franklin have designed a set of workloads for measuring the performance of their client-server Object-Oriented Database Management Systems (OODBMSs) (Carey, Franklin, Livny, and Shekita, 1991; Franklin, Carey, and Livny, 1993). These workloads operate at the page grain instead of the object grain, i.e., synthetic transactions read or write pages instead of objects. The workloads contain the notion of hot and cold regions (some areas of database are more frequently accessed compared to others), attempting to approximate real application behaviour. However, the hot region never moves, meaning no attempt is made to model dynamic application behaviour. The Ob ject Clustering Benchmark (OCB) OCB is a generic, tunable benchmark aimed at evaluating the performances of OODBMSs. It was first oriented toward testing clustering strategies (Darmont et al., 1998) and was later extended to become fully generic (Darmont and Schneider, 2000). The flexibility and scalability of OCB is achieved through an extensive set of parameters. OCB is able to simulate the behavior of the de facto standards in object-oriented benchmarking, namely OO1 (Cattell, 1991), HyperModel (Anderson et al., 1990), and OO7 (Carey et al., 1993). Furthermore, OCB s generic model can be implemented within an object-relational system easily and most of its operations are relevant for such a system. We only provide here an overview of OCB. Its complete specification is available in (Darmont and Schneider, 2000). The two main components of OCB are its database and workload. Database The OCB database is made up of NC classes derived from the same metaclass (Figure 1). Classes are defined by two parameters: MAXNREF, the maximum number of references in the instances and BASESIZE, an increment size used to compute the InstanceSize. Each CRef (class reference) has a type: TRef. There are NTREF different types of references (e.g., inheritance, aggregation...). Finally, an Iterator is maintained within each class to save references toward all its instances. 1 CLASS MAXNREF: Integer BASESIZE: Integer OBJECT 1 CRef Class_ID: Integer Iterator OID: Integer ORef TRef: Array [1..MAXNREF] of TypeRef 1 ClassPtr * Filler: Array [1..ClassPtr.InstanceSize] of Byte InstanceSize: Integer Attribute: Array [1..ATTRANGE] of Integer 1..MAXNREF 1..ClassPtr.MAXNREF 1 * BackRef Fig. 1. OCB database schema Each object possesses ATTRANGE integer attributes that may be read and updated by transactions. A Filler string of size InstanceSize is used to simulate the actual size of the object. After instantiating the schema, an object O of class C points through the ORef references to at most C.MAXNREF objects. There is also a backward reference (BackRef) from each referenced object toward the referring object O. The database generation proceeds through three steps. 1. Instantiation of the CLASS metaclass into NC classes and selection of class level references. Class references are selected to belong to the [Class ID - CLOCREF, Class ID + CLOCREF] interval. This models locality of reference at the class level. 2. Database consistency check-up: suppression of all cycles and discrepancies within the graphs that do not allow them, e.g., inheritance graphs or composition hierarchies. 3. Instantiation of the NC classes into NO objects and random selection of the object references. Object references are selected to belong to the [OID - OLOCREF, OID + OLOCREF] interval. This models locality of reference at the instance level. The main database parameters are summarized in Table 1. Parameter name Parameter Default value NC Number of classes in the database 50 MAXNREF(i) Maximum number of references, per class 10 BASESIZE(i) Instances base size, per class 50 bytes NO Total number of objects 20,000 NREFT Number of reference types 4 ATTRANGE Number of integer attributes in an object 1 CLOCREF Class locality of reference NC OLOCREF Object locality of reference NO Table 1. OCB database main parameters Workload The operations of OCB are broken up into four categories. Random Access: Access to NRND randomly selected objects. Sequential Scan: Randomly select a class and then access all its instances. A Range Lookup additionally performs a test on the value of NTEST attributes, for each accessed instance. Traversal: There are two types of traversals in OCB. Set-oriented accesses (or associative accesses) perform a breadth-first search. Navigational Accesses are further divided into Simple Traversals (depth-first searches), Hierarchy Traversals that always follow the same reference type, and Stochastic Traversals that select the next link to cross at random. Each traversal proceeds from a randomly chosen root object, and up to a predefined depth. All the traversals can be reversed by following the backward links. Update: Update operations are also subdivided into different types. Schema Evolutions are random insertions and deletions of Class objects (one at a time). Database Evolutions are random insertions and deletions of objects. Attribute Updates randomly select NUPDT objects to update, or randomly select a class and update all of its objects (Sequential Update). The Dynamic Evaluation Framework (DEF) The primary goal of DEF is to evaluate the dynamic performance of DBMSs. To make the work of DEF more general, we have made two key decisions: define DEF as an extensible framework; and reuse existing and standard benchmarks when available. Dynamic Framework We start by giving an example scenario that the framework can mimic. Suppose we are modeling an on-line book store in which certain groups of books are popular at certain times. For example, travel guides to Australia during the 2000 Olympics may have been very popular. However, once the Olympics is over, these books may suddenly or gradually become less popular. Once the desired book has been selected, information relating to the book may be required. Example required information includes customer reviews of the book, excerpts from the book, picture of the cover, etc. If the data are stored in an ODBMS, retrieving the related information is translated into an object graph navigation with the traversal root being the selected book. After looking at the related information for the selected book, the user may choose to look at another book by the same author. When information relating to the newly selected book is requested, the newly selected book becomes the root of a new object graph traversal. Next, we give an overview of the five main steps of the dynamic framework and in the process show how the above example scenario fits in. 1. H-region parameters specification: The dynamic framework divides the database into regions of homogeneous access probability (H-regions). In our example, each H-region represents a different group of books, each group having its own probability of access. In this step, we specify the characteristics of each H-region, e.g., its size, initial access probability, etc. 2. Workload specification. H-regions are responsible for assigning access probability to pieces of data (tuples or objects). However, H-regions do not dictate what to do then. We term the selected tuple or object workload root. In the remainder of this paper we will use the term root to mean workload root. In this step, we select the type of workload to execute after selecting the root. 3. Regional protocol specification. Regional protocols use H-regions to accomplish access pattern change. Different styles of access pattern change can be accomplished by changing the H- region parameter values with time. For example, a regional protocol may initially define one H-region with a high access probability, while the remaining H-regions are assigned low access probabilities. After a certain time interval, a different H-region may become the high access probability region. This, when translated to the book store example, is similar to Australian travel books becoming less popular after the 2000 Olympics end. 4. Dependency protocol specification. Dependency protocols allow us to specify a relationship between the currently selected root and the next root. In our example, this is reflected in the customer deciding to select a book which is by the same author as the previously selected book. 5. Regional and dependency protocol integration specification. In this step, regional and dependency protocols are integrated to model changes in dependency between successive roots. An example is a customer using our on-line book store, who selects a book of interest, and then is confronted with a list of currently popular books by the same author. The customer then selects one of the listed books (modeled by dependency protocol). The set of currently popular books by the same author may change with time (modeled by regional protocol). The first three steps we have described are generic, i.e., they can be applied on any selected benchmark and system type (relational, object-oriented, or object-relational). The two last steps are similar when varying the system type, but are nonetheless different because access paths and Methods are substantially different in a relational system (with tables, tuples, and joins) and an object- oriented system (with objects and references), for instance. Next, we further detail the concept of H-region and the generic regional protocol specification. H-regions H-regions are created by partitioning the objects of the database into non-overlaping sets. All objects in the same H-region

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.