A Research Blog

From Discrepancy to Privacy, and back

Different areas of mathematics, and theoretical computer science, freely borrow tools and ideas from each other and these interactions, like barters, can make both parties richer. In fact it’s even better than barter: the objects being ideas, you don’t have to actually give them up in an exchange.

And so it is no surprise that differential privacy has used tools from several other fields, including complexity, cryptography, learning theory and high-dimensional geometry. Today I want to talk about a little giving back. A small interest payment, if you will. Below I will describe how differential privacy tools helped us resolve a question of Alon and Kalai.

Discrepancy

This particular instance deals with discrepancy theory, more specifically, with the Erdös Discrepancy problem (EDP)(polymath 5 being discussed here).

In 1932, Erdös conjectured:

Conjecture[Problem 9 here] For any constant , there is an such that the following holds. For any function , there exists an and a such that

For any , the sets is a arithmetic progression containing ; we call such a set a Homogenous Arithmetic Progression (HAP). The conjecture above says that for any red blue coloring of the [n], there is some HAP which has a lot more of red than blue (or vice versa).

In modern language, this is a question about discrepancy of HAPs. So let me define discrepancy first. Let be a universe and let denote a family of subsets of . A coloring of is an assignment . The discrepancy of a set under the coloring is simply ; the imbalance of the set. The discrepancy of a set system under a coloring is the maximum discrepancy of any of the sets in the family. The minimum of this quantity over all colorings is then defined to be the discrepancy of the set system. We want a coloring in which all sets in are as close to balanced as possible. In other words, if denotes the set-element incidence matrix of the set system, then .

Thus the conjecture says that the discrepancy of HAPs over grows with .

This post deals with the related concept of Hereditary Discrepancy. The discrepancy can often be small by accident: even though the set system is complex enough to contain a high discrepancy set system in it, it can have discrepancy zero. The hereditary discrepancy measures the maximum discrepancy of any set system “contained” in .

Formally, given a set system , and a subset , the restriction of to is another set system , where . If we think of the set system as a hypergraph on , then the restriction is the induced hypergraph on . The hereditary discrepancy is the maximum discrepancy of any of its restrictions. In matrix language, is simply where the maximum is taken over all submatrices of .

Some examples. Let denote . A totally unimodular matrix gives us a set system with hereditary discrepancy at most . An arbitrary collection of sets has discrepancy at most . Note that a random coloring will give discrepancy about . In a famous paper, Spencershowed the upper bound, and Bansal and recently Lovett and Meka gave constructive versions of this result.

Privacy

Given a vector , and – matrix , consider the problem of outputting a vector such that is small, and yet the distribution of isdifferentially private, i.e. for and that are close the distributions of the corresponding ‘s are close. If you are a regular reader of this blog, you must be no stranger to differential privacy. For the purposes of this post, a mechanism satisfies differential privacy if for any such that , and any (measurable) :

Thus if are close in , the distributions are close in .

Researchers have studied the question of designing good mechanisms for specific matrices . Here by good, we mean that the expected value of the error say , or is as small as possible. It is natural to also prove lower bounds on the error needed to answer specific queries of interest.

A particular set of queries of interest is the following. The coordinates of the vector are associated with the hypercube . Think of binary attributes people may have, and for , denotes the number of people with as their attribute vector. For each subcube, defined by fixing of the bits, we look at the counting query corresponding to that subcube: i.e. . This corresponds to dot product with a vector with being one on the subcube, and zero elsewhere. Thus we may want to ask how many people have a in the first and second attribute, and a in the fourth. Consider the matrix defined by all the possible subcube queries.

Subcubes defined by fixing bits are -juntas to some people, contingency table queries to others. These queries being important from a statistical point of view, Kasiviswanathan, Rudelson, Smith and Ullman showed lower bounds on the amount of error any differentrially private mechanism must add, for any constant . When is , their work suggests that one should get a lower bound that is .

The connection

So what has discrepancy got to do with privacy? Muthukrishnan and Nikolov showed that if has large hereditary discrepancy, then any differentially private mechanism must incur large expected squared error. In fact, one can go back and check that nearly all known lower bounds for differentially private mechanism are really hereditary discrepancy lower bounds in disguise. Thus there is a deep connection between and the minimum achievable error for .

For the EDP, it is natural to ask how large the hereditary discrepancy is. Alon and Kalai show that it is and at most . They also showed that for constant , it is possible to delete an fraction of the integers in , so that the remaining set system has hereditary discrepancy at most polylogarithmic in . Gil guessed that the truth is closer to the lower bound.

Alex Nikolov and I managed to show that this is not the case. Since hereditary discrepancy is, well, hereditary, a lower bound on the hereditary discrepancy of a submatrix is also a lower bound on the hereditary discrepancy of whole matrix. In the EDP matrix on , we will first find our subcubes-juntas-contingency-tables matrix above as a submatrix; one for which is . Having done that, it would remain to prove a lower bound for itself.

The first step is done as follows: associate each dimension with th and the th prime. A point in the hypercube is naturally associated with the integer . A subcube query can be specified by a vector : is set to the appropriate – value for the coordinates that we fix, and for the unconstrained coordinates. We can associate a subcube with the integer . It is easy to see that is in the subcube corresponding to if and only if divides . Thus if we restrict ourselves to the integers and HAPs corresponding to , we have found a submatrix of the EDP matrix that looks exactly like our contingency tables matrix . Thus the hereditary discrepancy of the EDP matrix is at least as large as that of this submatrix that we found.

Lower bounds for private mechanisms for can be derived in many ways. For constant , the results of Kasiviswanathan et al. referred to above suffice and it is likely that they can be pushed to get the lower bounds we are shooting for. A very different approach of extracting from weak random sources also implies such a lower bound. It is likely that from these, one could get a lower bound on of the kind we need.

However, given what we know about , we can in fact remove the privacy scaffolding and get a simpler direct proof of the lower bound of on , and can write a proof of the lower bound without any mention of privacy. This implies that the hereditary discrepancy for the EDP is at least , which matches the upper bound up to a constant in the exponent. A brief note with the proof is here.

Of course the EDP itself is wide open. Head on here to help settle the conjecture.

Many thanks to Alex Nikolov for his contribution to the content and the writing of this post.

Related

Great post. Reminds me of a talk Avi once gave titled something like “why it’s worthwhile sitting in talks in different fields.” I should add that sometimes fields don’t only borrow ideas but also borrow the researchers from other fields. So beware guys … 🙂

Very nice post! I have always thought about constructing DP lower bounds as equivalent to showing the existence of large packings around the optimal value. For example, our sample complexity bounds for learning in [1] and statistics [2] are all based on this concept of packings. Do you know if there’s a formal relationship between the two concepts?

Thanks for the pointers.
For (eps,0)-differential privacy, pretty much all lower bounds I know of are based on packing arguments. For linear queries, Moritz and I showed[1] that this is tight up to a polylog factor. We can go a little further than packing based lower bounds[2] by”adding up” packing-based lower bounds over orthogonal subspaces. It is a good question if packing based LBs are nearly tight in some more general setting.
For (eps,delta)-differential privacy, the picture is quite different. Packing based lower bounds only apply when delta is really small, and do not apply for say polynomially small delta. In fact, you can sometimes get (eps,1/poly(n))-DP mechanisms with accuracy significantly better than the packing based lower bounds for (eps,0) [3].
So in most settings, for proving non-trivial lower bounds of (eps,delta), one must go beyond packings. Many of the lower bounds go via lower bounds for some variant of blatant non-privacy. As this post mentions, these are often lower bounds on discrepancy, or on the smallest eigenvalue of an associated matrix.

The references [2] and [3] were flipped. Just fixed that.
Also, just to clarify, I did not mean to suggest that [1] introduced packing arguments. There are earlier papers with a packing argument, and certainly people in the community knew of it independently well before it appeared in print. What I find surprising is that in many settings, it is tight or nearly tight for (eps,0). Our failure to come up with more clever lower bounding techniques is, in many cases, not our fault. 🙂

1. Fix $n$. Let $A_k$ be the $C_k^n \times 2^n$ matrix encoding subcube queries with $k$ of $n$ input variables fixed. What is the (plain, nonhereditary) discrepancy of $A$? In the 2 norm? In the infinity norm?

2. What about including all $k$ from 1 to $n$? Hence what is the discrepancy of all subcube queries?