People

Projects, Publications, and Software

Relational Data Market in the Cloud

Cloud-computing is transforming many aspects of data management. Most recently, the cloud is seeing the emergence of digital markets
for data and associated services. We observe that our community has a lot to offer in building successful cloud-based data markets. In this project, we investigate some of the key challenges that such markets face and we build tools for supporting them.

Current mechanisms for pricing data are very simple: buyers can choose only from a set of explicit views, each with a specific price.
In the following work, we propose a framework for pricing data on the Internet that, given the price of a few views, allows the price of any query
to be derived automatically. We call this capability query-based pricing.

Pricing Private Data

Personal data has huge value, both its owner and to institutions who would like to analyze it.
As the awareness of the value of the personal data increases, there is a drive in industry to compensate the end user for her private information.
This paper proposes a theory on how to price private data.

Data Use Management

When valuable data is exchanged or bought, it is frequently encumbered by restrictions on how it may be used. For ex_ ample, clinical data must not be used in such a way as to ex_ pose the patients’ identities. To date, these restrictions are enforced only contractually and compliance is checked only manually, if at all. To meet the needs of this growing set of applications we explore the design of a Data Use Manager and research efficient algorithms for its implementation as a component of a database system that enables the declarative specification and enforcement of sophisticated data use policies and provides capabilities for both their online enforcement and offline audit.

Collaborative Data Management in the Cloud

Data-management-as-a-service systems are increasingly used in collaborative settings, where multiple users access common data sets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major challenge: how to select which optimizations to perform and share their cost among users. The problem is especially challenging when users are selfish and will only report their true values for different optimizations if it maximizes their utility. We study mechanism-design-based techniques for addressing this challenge.

Acknowledgments

The Data Eco$y$tem project is partially supported by the National Science Foundation and Microsoft through NSF CiC grant CCF 1047815 and NSF grant IIS-0915054 and additional gifts from Microsoft Research. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.