Abstract

Horizontal data partitioning is the process of splitting access objects into set of disjoint rows. It was first introduced in the end of 70’s and beginning of the 80’s (Ceri et al., 1982) for logically designing databases in order to improve the query performance by eliminating unnecessary accesses to non-relevant data. It knew a large success (in the beginning of the 80’s) in designing homogeneous distributed databases (Ceri et al., 1982; Ceri et al., 1984; Özsu et al., 1999) and parallel databases (DeWitt et al., 1992; Valduriez, 1993). In distributed environment, horizontal partitioning decomposes global tables into horizontal fragments, where each partition may be spread over multiple nodes. End users at the node can perform local queries/transactions on the partition transparently (the fragmentation of data across multiple sites/processors is not visible to the users.). This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security. In parallel database context (Rao et al., 2002), horizontal partitioning has been used in order to speed up query performance in a sharednothing parallel database system (DeWitt et al., 1992). This will be done by both intra-query and intra-query parallelisms (Valduriez, 1993). It also facilitates the exploitation of the inputs/outputs bandwidth of the disks by reading and writing data in parallel. In this paper, we use fragmentation and partitioning words interchangeably.

Introduction

Horizontal data partitioning is the process of splitting access objects into set of disjoint rows. It was first introduced in the end of 70’s and beginning of the 80’s (Ceri et al., 1982) for logically designing databases in order to improve the query performance by eliminating unnecessary accesses to non-relevant data. It knew a large success (in the beginning of the 80’s) in designing homogeneous distributed databases (Ceri et al., 1982; Ceri et al., 1984; Özsu et al., 1999) and parallel databases (DeWitt et al., 1992; Valduriez, 1993). In distributed environment, horizontal partitioning decomposes global tables into horizontal fragments, where each partition may be spread over multiple nodes. End users at the node can perform local queries/transactions on the partition transparently (the fragmentation of data across multiple sites/processors is not visible to the users.). This increases performance for sites that have regular transactions involving certain views of data, whilst maintaining availability and security. In parallel database context (Rao et al., 2002), horizontal partitioning has been used in order to speed up query performance in a shared-nothing parallel database system (DeWitt et al., 1992). This will be done by both intra-query and intra-query parallelisms (Valduriez, 1993). It also facilitates the exploitation of the inputs/outputs bandwidth of the disks by reading and writing data in parallel. In this paper, we use fragmentation and partitioning words interchangeably.

There are two versions of horizontal partitioning (Özsu et al., 1999): primary and derived. Primary horizontal partitioning of a relation is performed using selection predicates that are defined on that relation. Note that a simple predicate (selection predicate) has the following form: A θ value, where A is an attribute of the table to be fragmented, θ is one of six comparison operators {=, <, >, ≤, ≥} and value is the predicate constant belonging to the domain of the attribute A. Derived horizontal partitioning, on the other hand, is the fragmentation of a relation that results from predicates being defined on another relation. In other word, the derived horizontal partitioning of a table is based on the fragmentation schema of another table (the fragmentation schema is the result of the partitioning process of a given table). The derived partitioning of a table R according a fragmentation schema of S is feasible if and only if there is a join link between R and S. The relation at the link is called the owner of the link (the case of the table S) and the relation at the head is called member (Ceri et al., 1982) (the case of the relation R).

Key Terms in this Chapter

Distributed Database: Data in distributed database system is stored across several sites, and each is site is typically managed by a DBMS that can run independently of the other sites (autonomy of sites).

Dimension: A business perspective useful for analyzing data. A dimension usually contains one or more hierarchies that can be used to drill up or down to different levels of detail

Data Warehouse: An integrated decision support database whose content is derived from the various operational databases. (1) A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes.

Parallel Database: A database management system that is implemented on a tightly coupled multiprocessor.

Fact Table: The central table in a star schema, containing the basic facts or measures of interest. Dimension fields are also included (as foreign keys) to link to each dimension table.

Dimension Table: A table containing the data for one dimension within a star schema. The primary key is used to link to the fact table, and each level in the dimension has a corresponding field in the dimension table.