Years of Citing Articles

Bookmark

OpenURL

Abstract

Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube..A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query. 1

Citations

...a given dimension may have many attributes as we shall see in Section 2. Users are also interested in consolidated sales: for example, what is the total sales of a given part p to a given customer c? =-=[GBLP95] suggest adding an a-=-dditional value &quot;ALL&quot; to the domain of each dimension to achieve this. In the question above we want the total sales of a given part p to a given customer c for &quot;ALL&quot; suppliers. Th...

...nd if we wish to find the total sales for each part from the ancestor view (part, supplier), we need to do an aggregation over this view. We can use either hashing or sorting (with early aggregation) =-=[G93]-=- to do this aggregation. The cost of doing the aggregation is a function of the amount of memory available and the ratio of the number of rows in the input to that in the output. In the best case, a s...

...he number of distinct values of part and supplier in the raw data. There are many well-known sampling techniques that we can use to determine the number of distinct values of attributes in a relation =-=[HNSS95]-=-. 4 Optimization of Data-Cube Lattices Our most important objective is to develop techniques for optimizing the space-time tradeoff when implementing a lattice of views. The problem can be approached ...

.... Query optimizers and query evaluation techniques can be enhanced to handle aggregations better [CS94], [GHQ95], [YL95], to use different indexing strategies like bit-mapped indexes and join indexes =-=[OG95]-=-, and so on. Work was supported by NSF grant IRI--92--23405, by ARO grant DAAH04--95--1--0192, and by Air Force Contract F33615--93--1--1339 Authors' address: Department of Computer Science, Stanford ...

...few seconds or a few minutes at the most. There are many ways to achieve such performance goals. Query optimizers and query evaluation techniques can be enhanced to handle aggregations better [CS94], =-=[GHQ95]-=-, [YL95], to use different indexing strategies like bit-mapped indexes and join indexes [OG95], and so on. Work was supported by NSF grant IRI--92--23405, by ARO grant DAAH04--95--1--0192, and by Air ...

...es of a few seconds or a few minutes at the most. There are many ways to achieve such performance goals. Query optimizers and query evaluation techniques can be enhanced to handle aggregations better =-=[CS94]-=-, [GHQ95], [YL95], to use different indexing strategies like bit-mapped indexes and join indexes [OG95], and so on. Work was supported by NSF grant IRI--92--23405, by ARO grant DAAH04--95--1--0192, an...

...ds or a few minutes at the most. There are many ways to achieve such performance goals. Query optimizers and query evaluation techniques can be enhanced to handle aggregations better [CS94], [GHQ95], =-=[YL95]-=-, to use different indexing strategies like bit-mapped indexes and join indexes [OG95], and so on. Work was supported by NSF grant IRI--92--23405, by ARO grant DAAH04--95--1--0192, and by Air Force Co...

...esoperator. For example: (part) 6 (customer) and (customer) 6 (part). Thesoperator imposes a partial ordering on the queries. We shall talk about the views of a data-cube problem as forming a lattice =-=[TM75]-=-. In order to be a lattice, any two elements (views or queries) must have a least upper bound and a greatest lower bound according to thesordering. However, in practice, we only need the assumptions t...

... we can go with the materialize-everything strategy. The Essbase system [ESS], for example, materializes the whole data cube, while BusinessObjects [X94] materializes nothing, and the MetaCube system =-=[STG]-=- materializes part of the cube. There is also the issue of where the materialized data cube is stored: in a relational system or a proprietary MDDB (multi-dimensional database) system. In this paper, ...

...u. 1 A commonly used technique is to materialize (precompute) frequently-asked queries. The data warehouse at the Mervyn's department-store chain, for instance, has a total of 2400 precomputed tables =-=[Rad95]-=- to improve query performance. Picking the right set of queries to materialize is a nontrivial task, since by materializing a query we may be able to answer other queries quickly. For example, we may ...

...w is the number of distinct values of the attributes it groups by, There are many well-known sampling techniques that we can use to determine the number of distinct values of attributes in a relation =-=[HNSS95]-=-. 4 Optimizing Data-Cube Lattices Our most important objective is to develop techniques for optimizing the space-time tradeoff when implementing a lattice of views. The problem can be approached from ...

... value of a cell given its address. To allocate only space for those cells present in the raw data and not every posstble cell of the data cube, a cell-address hashing scheme is used. Arbor’s Essbase =-=[Arb]-=- and many other MDDBs are implemented this way. Note, this approach still materializes all the cells of the data cube present in raw data, which can be very large. The other approach is to use relatio...