Data Markets in the Cloud: An Opportunity for the Database Community Magdalena Balazinska, Bill Howe, and Dan Suciu University of Washington Project supported.

Similar presentations

Presentation on theme: "Data Markets in the Cloud: An Opportunity for the Database Community Magdalena Balazinska, Bill Howe, and Dan Suciu University of Washington Project supported."— Presentation transcript:

1
Data Markets in the Cloud: An Opportunity for the Database Community Magdalena Balazinska, Bill Howe, and Dan Suciu University of Washington Project supported in part by NSF and Microsoft

6
Technical Challenges (1) Study the behavior of agents in a data market Study how data should be priced – E.g., Pointless to price data based on production costs – E.g., Useful to create versions for different market segments Inform public policy regulating the data market Magdalena Balazinska - University of Washington6 Challenges for economists

7
Technical Challenges (2) Magdalena Balazinska - University of Washington7 Develop and study pricing models for data How should sellers specify pricing parameters? How should system compute prices based on seller input? What are the properties of various pricing models Develop supporting tools and services Tools for expressing and computing prices Tools for processing priced data Challenges for database community

9
Technical Challenges (2) Magdalena Balazinska - University of Washington9 Develop and study pricing models for data How should sellers specify pricing parameters? How should system compute prices based on seller input? What are the properties of various pricing models Develop supporting tools and services Tools for expressing and computing prices Tools for processing priced data

10
Example Scenario Seller has a database of business contact information Economist: Supply and demand dictate that – businesses in entire country: $600 – businesses in one province or state: $300 – one type of business: $50 Buyer: –Q1: Businesses with more than 200 employees (selection) –Q2: Businesses in same city as Home Depot (self-join) –Q3: Businesses in cities with high yearly precipitation (join) How to satisfy buyer? Magdalena Balazinska - University of Washington10

11
Current Pricing: Fixed Prices Fixed price for entire dataset (CustomLists, Infochimps) Must create and price views specific to queries Q1, Q2, Q3 OR user must buy entire dataset if view not available AND user must perform joins by herself Certainly the case if datasets have different owners 11Magdalena Balazinska - University of Washington

12
Current Pricing: Subscriptions Subscriptions (Azure DataMarket, Infochimps API) – Fixed number of transactions per month – Must create and price appropriate parameterized queries – Currently these queries are dataset specific (i.e., no joins!) – Can satisfy Q1: Businesses with more than 200 employees – Harder Q2: Businesses in same city as Home Depot – Cannot Q3: Businesses in cities with high yearly precipitation 12Magdalena Balazinska - University of Washington

15
Potential Approach: View-Based Pricing System computes other query prices – Q2: Businesses in same city as Home Depot, etc. –Price computation is automated –Solved as a constrained optimization problem System guarantees price prope rties –For example, ensures that no arbitrage is possible 15Magdalena Balazinska - University of Washington

16
Data Pricing Challenges Understand properties of pricing schemes – When can we guarantee that no arbitrage is possible? How to handle data updates? – Will updates require changes to prices? How to handle price updates? – Will one price-change affect all others? How to price value-added of data transformations? – Should a self-join query be more expensive than a selection? – Should queries with empty results be free? How to price data properties (e.g., cleanliness)? 16Magdalena Balazinska - University of Washington

17
Technical Challenges (2) Magdalena Balazinska - University of Washington17 Develop and study pricing models for data How should sellers specify pricing parameters? How should system compute prices based on seller input? What are the properties of various pricing models Develop supporting tools and services Tools for expressing and computing prices Tools for processing priced data

25
Strawman 3: View-Based Pricing This is a constrained optimization problem – Each query price is a constraint – Can add other constraints: e.g., total price of DB Two methods to derive prices of new queries – Reverse-eng. price of base tuples s.t. constraints Assume a function that converts base tuple prices into query prices Compute base tuple prices in a way that maximizes entropy, user utility, or other function s.t. constraints – Compute new query prices directly 25Magdalena Balazinska - University of Washington

30
Data Pricing Issues (continued) Lump sum or subscription pricing is also inflexible – For lump sum, can only buy pre-defined views – For subscription, can only ask pre-defined queries Would like arbitrary queries over multiple datasets 30Magdalena Balazinska - University of Washington