50% off Encyclopedia of Information Science and Technology, Third Edition (10-Volumes)

This discipline-defining encyclopedia serves research needs in numerous fields that are affected by the rapid pace
and substantial impact of technological change and is a must have for every academic library collection.
Expires 12/31/2016.

Abstract

Running large data warehouses (DW) efficiently over low cost platforms places special requirements on the design of system architecture. The idea is to have the DW on a set of low-cost nodes in a non-dedicated local-area network (LAN). Nodes can run any relational database engine, and the system relies on a partitioning strategy and query processing middle layer. These characteristics are in contrast with typical parallel database systems, which rely on fast dedicated interconnects and hardware, as well as a specialized parallel query optimizer for a specific database engine. This chapter describes the architecture of the Node-Partitioned Data Warehouse (NPDW), designed to run on the low cost environment, focusing on the design for partitioning, efficient parallel join and query transformations. Given the low reliability of the target environment, we also show how replicas are incorporated in the design of a robust NPDW strategy with availability guarantees and how the replicas are used for always-on, always efficient behavior in the presence of periodic load and maintenance tasks.