Could the people voting to close perhaps try and explain what their issue is?
–
Vidit NandaJun 29 '13 at 19:25

A number of good ideas went into the concept of electronic bank/money. One of their main concern was anonimity. There is a conflict though with the existing financial and tax system. Perhaps others know more about the involved algorithms, related to the public encription keys. A crucial idea is to replace certainty with near-certainty, like $1-10^{-12}$ instead of perfect $1$.
–
Włodzimierz HolsztyńskiJun 29 '13 at 21:53

2

You should check out Andrew Blumberg's website. He's done work in both algebraic topology and in privacy (separately) but his thinking on privacy has a distinctly mathematical flavor
–
David WhiteJul 16 '13 at 22:33

5 Answers
5

Here is an example of the type you appear to be seeking. The story -- like many good stories -- involves a million dollar prize offered by a billion dollar corporation, sought by armies of computer wizards. There's even a courtroom scene at the climax.

In 2007, Netflix released a dataset consisting of roughly a hundred million movie ratings (from 1 through 5 stars) given by 500 thousand users. The basic idea was to award prize money to anyone who managed improve upon the existing Netflix algorithm for predicting (based on your past ratings) how much you will like a new movie. A team from AT&T research won the million dollar prize by producing an algorithm which improved the state of the art by 10% or so. Many references for those interested in the full story can be found on wikipedia here.

Of course, in order to preserve their users' privacy, Netflix "anonymized" their dataset. Meaning, they removed all mention of user names and ip addresses, etc before making the data available to the public. Much to their surprise (and their users' annoyance), a team of computer scientists from the University of Texas used publically available non-anonymous data from IMDB and deduced the identities of many Netflix users. I'm not qualified to judge the sophistication of their framework, but from a glance at their paper there certainly seems to be some non-trivial mathematics involved.

Here's an open question:

what statistical properties of your anonymized data might guarantee (with high probability) that it can not be de-anonymized?

I suspect that such considerations are likely to guide a lot of research in statistics at least in the foreseeable future

The mathematics of privacy is a HUGE area of research. It started in the statistics community way back when, with issues of disclosure along the lines of what Vidit Nanda mentioned. In the 90s, things exploded in the CS community when a researcher at CMU first showed that using only publicly disclosed information she could retrieve private information about the governor of Massachusetts.

Perhaps the most mathematically mature take on privacy is the idea of differential privacy: like in crypto, it frames the problem in terms of what's computationally feasible. Specifically, the classic problem of differential privacy is: design a scheme to release data so that a computationally bounded adversary cannot tell the difference between the released data and a release of a db that differs on one element with more than an exponentially small probability.

Another interesting topic is on performing computation on encrypted data. For example, you can have your data encrypted and yet use cloud services on them. There are mathematicians and computer scientists working on this, however the current constructions are not efficient enough for practical purposes yet (AFAIK).

I don't know what you can expect. You might be interested in work of Cynthia Dwork on the concept of differential privacy. The idea concerns what information can be revealed by queries to a database in spite of attemts to anonymize or average the data or limit the scope of the queries. It seems some valid statistical analyses can be done even if the answerer throws in a random but intentional error to help protect privacy.

The article at jeremykun.com mentioned in Samuel Monnier's comment also mentions Dwork. The other answers should
give you enough encouragement to check out the research.

Proving and verifying statements about private data without releasing that data is a nice candidate question/problem with some known solutions. Let me hereby name interactive proof/argument systems, zero knowledge, commitments, oblivious transfer as well as implementations like U-Prove, DAA, Idemix.