To motivate my question, I will describe a related problem and then give a solution to it. My question will then be a variant of this problem.

N individuals sit around a table and want to compute the average of their salaries. They wish to do this in a manner such that no private information is leaked. This is to say no one obtains any information (regarding the other's salaries) that he couldn't deduce from the public information.

More formally we assume: (1) all of the salaries are non-negative integers bounded by B (2) everyone behaves honestly and doesn't attempt to halt the process (3) no subset of individuals will collude (4) there are secure private lines of communications between all participants (5) all of this information is well known (6) there is no outside trusted party.

Question 1: Is it possible for the N individuals to collectively compute the average without leaking any information? We say information is leaked if any individual has any information at the end of the process regarding anyone else's salary that he couldn't have deduced from knowing his own salary and the average.

The answer is Yes. It suffices to compute the sum of the salaries. Set S = 10*N*B. Now the first individual (Alice) chooses a uniformly random number between 0 and S-1 and adds this to her salary mod S. She then passes the sum to her neighbor, Bob, who adds his salary. This continues around the table until Zoey (the last participant) passes the number back at Alice. Alice subtracts off the random number and announces the sum to the group.

Here are two related questions:

Question 2: Is it possible for the group to compute the maximum salary (subject to the constraints above) without leaking any information?

Question 3: Can we remove the assumption that a bound on the size of the salaries are known in advance from the algorithm given above.

Additional Note: In Question 2 we want to compute only the maximum without providing any other information. One can note that, say, the entire distribution of salaries could be computed and communicated to the group by computing moments of the sequence via the method above. This would give the maximum (however a lot of other information as well).

Can't any of the participants deduce information by reversing the process until they come up with their salary, thus coming up with the random number? Also, the one who chooses it in the first place and do this right away?
–
user1959Nov 20 '09 at 4:30

I don't I understand what you mean by "reversing the process". Can you elaborate?
–
Mark LewkoNov 20 '09 at 4:37

Incidentally, "group" has a very strong mathematical meaning. I don't have a good replacement word --- "Computing the maximum salary" is my best suggestion for title.
–
Theo Johnson-FreydNov 20 '09 at 6:15

1

This wouldn't work for 2 people would it?
–
user1447Nov 20 '09 at 21:26

3

In the situation of two people one can figure out the other person's salary just from knowing the average. Thus there is no information leaked (beyond what can be deduced from the public information).
–
Mark LewkoNov 20 '09 at 21:33

3 Answers
3

These questions (and many others) are studied in the literature under the heading of secret-sharing or common-knowledge protocols. A nice but short review appears in chapter 4 of David Gale's "Tracking the Automatic Ant".

The "sum protocol" you presented can be modified to determine how many people have salary x (without revealing their identity). Just have each participant communicate 0 or 1 according to whether or no he has salary x, and by scanning all x's in the (presumably known) range of salaries you can learn the distribution, as well as the maximum (by scanning down from some upper bound). However, this protocol reveals not only the max salary, but also how many people earn that max.

Such protocols are called $t$-private if they do not reveal any additional information unless $t$ people 'collude' and discuss their knowledge with each other. The protocol you mentioned is, in fact, $n$-private - unless everyone cooperates, they are all in the dark (EDIT: this is false, of course, as pointed out in the comments. The correct $n$-private protocol is described below). The sum is, essentially, the only function that can be computed $n$-privately. The maximum (without the extra knowledge of how many people earn it), the product etc. can all be computed $t$-privately for $t < n/2$, but not for $t \geq n/2$. The existence is proved by Ben-Or, Goldwasser and Wigderson in STOC 1988; the non-existence by Chor and Kushilevitz (STOC 1989) for Boolean functions and by Beaver for general integer-valued functions. This is all extracted from Gale's book.

How to compute the sum $n$-privately: Each person breaks up their salary into a sum of $n$ numbers, chosen at random except for the constraint that their sum is $n$. Each person now communicates the $j$th part to the $j$ person (including "communicating" one of the parts to herself). They then all announce the sums of all the pieces that were communicated to them. It's a fun exercise to show that no $k$ people can figure anything out other than whatever can be derived from their own salaries, for any $k < n$.

This answers a lot of my questions. Thanks! The situation when a bound on the salaries is not known in advance still appears unanswered, however.
–
Mark LewkoNov 20 '09 at 5:02

I don't see why the protocol above is n-private (or maybe I just don't understand the definition). If the two people sitting on the sides of Joe collude they can figure out Joe's salary.
–
Mark LewkoNov 20 '09 at 5:14

Sorry, my mistake - that protocol is indeed just 1-private, but there's a nice simple variant which is n-private. Let me edit my response to fix that. However, I'm not sure how to handle the situation of no known bound on the salaries.
–
Alon AmitNov 20 '09 at 5:33

For question 3, it seems to me that the algorithm described in the question can be modified so as not to require knowing a bound on salaries to start with: instead of working modulo 10*N*B, just do everything over Z.

Now Alice needs to choose a random number from Z. Of course, she cannot use the uniform distribution, but all that is necessary is that she not publicize any information about the distribution she uses. If that information is not public, I don't see how anyone could deduce anything from the one number which they see in the course of implementing the algorithm.