Pragmatic Game Theoretic Approaches to Privacy in the Era of Big Data

Our lives increasingly revolve around data, often without our awareness. There is power to that, as bigger and better data yield bigger and better insights. Not everything about data is a clear win, however. One concern in particular is that individuals are losing, or have lost, a great deal of privacy as a result of the drive to collect and analyze high fidelity datasets. Such privacy concerns have resulted in numerous regulations around the world, such as HIPAA in the US, attempting to limit potential privacy risk associated with data sharing and analysis. The fundamental tension between these two opposing drivers of data utility and privacy is as yet unresolved, although considerable progress has been made in understanding and analyzing it. In recent research we have been exploring game theoretic approaches that enable pragmatic data sharing policies. First, I describe our work in devising privacy preserving sharing of structured data, where we use a policy space typically considered in k-anonymity, but use game theory to quantify privacy risk due to an economically motivated attacker. Second, I discuss the extension of this general approach to genomic data sharing, where common practices involve sharing summary statistics, or allowing a sequence of basic "existence" queries. Finally, I present our approach to the problem of automatically sanitizing sensitive text data, such as clinical notes.