Optimizing MPF Queries: Decision Support and Probabilistic Inference

Hector Corrada Bravo, Raghu Ramakrishnan
2006

We identify a broad class of aggregate queries, called MPF queries, inspired by the literature on marginalizing product functions. MPF queries operate on “functional relations,” where a measure attribute is functionally determined by the other relation attributes. An MPF query is an aggregate query over a stylized join of several functional relations. In the motivating literature on probabilistic inference, this join corresponds to taking the product of several probability distributions, and the grouping step corresponds to marginalization. Thus, MPF queries represent probabilistic inference in a relational setting. While they play a central role in probabilistic inference, and our work complements recent work that provides a framework for probabilistic inference in a database setting, we present MPF queries in a general form where arbitrary functions other than probability distributions are handled. We demonstrate the value of MPF queries for decision support applications through a number of illustrative examples. We exploit the relationship to probabilistic inference in query evaluation by combining database optimization techniques for aggregate queries with traditional algorithms from the probabilistic inference literature, such as Variable Elimination and Belief Propagation. We consider how to optimize individual queries, combining features from Variable Elimination and Chaudhuri and Shim’s algorithm for optimizing Group By queries. We also present an algorithm to find a cache of materialized views in order to efficiently evaluate a workload of MPF queries, combining Belief Propagation, Junction Trees, and database-style Group By optimizations. These results are especially interesting and timely because of the growing interest in managing data with uncertainty using probabilistic frameworks.