Friday, October 23, 2009

Various fields have various notions of "nice proofs," be they combinatorial, or elementary, or bijective. In TCS, perhaps the correct standard for lower bound proofs should be "encoding proofs." In these proofs, one starts with the assumption that some algorithm exists, and derives from that some impossible encoding algorithm, e.g. one that can always compress n bits into n-1 bits.

A normal lower bound will have a lot of big-bad-ugly statements -- "there are at least A bad sets (cf Definition 12), each containing at least B elements, of which at most a fraction of C are ugly (cf Definition 6)". To deal with such things, one invokes concentrations left and right, and keeps throwing away rows, columns, elements, and any hope that the reader will not get lost in the details.

There are 3 huge problems with this:

Most lower bounds cannot be taught in a regular class. But we can't go on saying how problems like P-vs-NP are so awesome, and keep training all our graduate students to round LPs better and squeeze randomness from stone.

The reader will often not understand and appreciate the simple and beautiful idea, as it is too hard to pull apart from its technical realization. Many people in TCS seem to think lower bounds are some form of dark magic, which involves years of experience and technical development. There is certainly lots of dark magic in the step where you find small-but-cool tricks that are the cornerstone of the lower bound; the rest can be done by anybody.

You start having lower-bounds researchers who are so passionate about the technical details that they actually think that's what was important! I often say "these two ideas are identical" only to get a blank stare. A lower bound idea never talks about entropy or rectangle width; such things are synonymous in the world of ideas.

Proofs that are merely an algorithm to compress n bits have elegant linearity properties (entropy is an expectation, therefore linear), and you never need any ugly concentration. Anybody, starting with a mathematically-mature high school student, could follow them with some effort, and teaching them is feasible. Among researchers, such proofs are games of wits and creativity, not games involving heavy guns that one may or may not have in their toolkit.

***

My paper on lower bounds for succinct rank/select data structures was submitted to SODA in extraordinary circumstances. I had been travelling constantly for a month, and the week before the deadline I was packing to move out of California and down with a flu. In the end, the only time I found for the paper was on an airplane ride to Romania, but of course I had no laptop since I had just quit IBM. So I ended up handwriting the paper on my notepad, and quickly typing it in on my sister's laptop just in time for the deadline.

[ You would be right to wonder why anybody would go through such an ordeal. I hate submitting half-baked papers, and anyway I wanted to send the paper to STOC. But unfortunately I was literally forced to do this due to some seriously misguided (I apologize for the hypocritical choice of epithets) behavior by my now-coauthor on that paper. ]

If you have 8 hours for a paper, you use all the guns you have, and make it work. But after the paper got in, I was haunted by a feeling that a simple encoding proof should exist. I've learnt long ago not to resist my obsessions, so I ended up spending 3 weeks on the paper -- several dozen times more than before submission :). I am happy to report that I found a nice encoding proof, just "in time" for the SODA camera-ready deadline. (As you know, in time for a camera-ready deadline means 2 weeks and 1 final warning later.)

20 comments:

I am curious to learn how many commenters are going to look at the techniques in the paper(s), and how many people are going to zoom in on the tiny politically incorrect statements in Mihai's blog post. :)

I am curious to learn how many commenters are going to look at the techniques in the paper(s), and how many people are going to zoom in on the tiny politically incorrect statements in Mihai's blog post.

To each his own :)

So the camera-ready version bears little resemblance to the accepted version? Can it even be called a "peer-reviewed" publication in that case?

Well, the common opinion is that PCs rarely read technical in the paper. So it's as peer-reviewed as any other paper in the conference :) -- my introduction didn't change, of course.

You quit IBM?

Sure (with ample notice). I was going to the Central European Olympiad in Romania, and then starting at ATT.

I'm not a computer scientist, and to me this sounds like a recipe for lots and lots of incorrect results. Add to that the common practice in CS of not submitting to journals...

Spotting errors is very hard. The idea that journal reviewers do it is wishful thinking. But the good news is that 90% of what we publish is crap, so who cares if it's correct or not? If somebody actually cares about your result, it will get checked.

You mean you were getting scooped or something on this result and so you were forced to do it to ensure at least a merge?

I was getting "scooped" by a weaker result which was not an independent discovery. My choice was between submitting to SODA and accepting a merger of the author lists, or starting a public fight. Since I chose the former, it doesn't make sense to go into details now.

Interestingly, as far as I know, the encoding technique for proving the lower bounds was first observed by Gennaro-Trevisan. They observed that if there is a small circuit for inverting a random permutation with non-trivial probability, then you can compact the description of the random permutation.

Although quite basic, I totally agree and love this technique. More recently, several paper co-authored by Iftach Heitner (and others, of course) applied this technique to much more powerful situations, where a direct proof seem hard. One nice thing about the encoding technique is that the encoder/decoder are allowed to be inefficient, if you one proves the lower bound on efficiency of some algorithm or cryptographic assumption.

Recently, I worked with Iftach (and a student) on a paper where we successfully used this technique to argue the impossibility of black-box reduction from one problem to another (more or less), and I truly appreciated its power.

Very interesting that it is used in algorithms much less (and much more recently) than in crypto, and that it is not as known in data structures as much as it is known in crypto.

already had the main idea of the proof you mention from my paper with Rosario. He deals with the simpler question of the complexity of inverting a random permutation on all inputs and he shows that, roughly speaking, if you have an oracle inversion algorithm for a permutation that uses m bits of advice and has query complexity q, then a 1/q fraction of the outputs of the permutation are implied by the other outputs and by the advice.

If you think of the permutation itself, plus the m bits of advice, as a data structure with m bits of redundancy, and of the q-query oracle inversion algorithm as a q-query algorithm in the cell-probe model, then Yao's result (and the one with Rosario, and the later ones etc.) are really redundancy/query trade-off for certain systematic data structures, so it's not surprising that one would end up using similar techniques.

Regarding Kolmogorov complexity comparison, I think the difference is that the encoding arguments are expected to be used there, since this is more or less what definition states. But the encoding technique is somewhat surprising at the first glance (until one thinks about it, as Luca just did).

Indeed, my first inclination to argue that no small circuit can invert a random permutation with forward oracle access is to perhaps to first fix the circuit, and argue that

Pr (fixed circuit succeeds with probability > e) << 1/#circuits.

And computing the latter probability is somewhat painful. Certainly doable for this relatively simple example, but IMHO much more complicated than the beautiful encoding technique. Namely, compress a random permutation as follows (simplified): give the small circuit as advice, describe the set S of inputs on which the circuit succeeds, describe pi^{-1}(S) as a set, and then explicitly describe pi^{-1}(y) for all remaining y's. The rest is just counting the size of this description in one line! Very natural, no probability calculations!

The encoding technique is certainly well-known in data structures. It appeared for instance in [Fredman '82], [Demaine Lopez-Ortiz '01], [Demaine Patrascu '04], [Golynski '09]. I don't think you can attribute it to any one person.

Luca, now we have lower bounds for storing permutations in a non-systematic fashion [Golynski'09]. Can this be connected to crypto? Does non-systematic have any natural meaning there?

What is that you don't like in the first proof? [...] Is it this conditioning that you consider not too intuitive? Just asking since I found the first proof really cute

The high entropy given the conditioning is intuitive (for someone in the area), but rather technical to prove. For instance, I didn't even deal with independent variables, but with two dependent vectors, each having independent coordinates.

If you can get the proof down to essentially high school math, you should always do it :)

One crypto application, which can already be hinted from my paper with Erik is that if gives a lower bound on the time for decrypting a message on the bit/cell probe model. Although the lower bound is extremely weak, nothing higher than that has been proven as far as we were able to ascertain.

Here's the scheme. You create a random permutation π of the integers 1 to n. Consider this permutation to be your encrypting key as follows. A message M composed of say nlog n bits is encoded by sending π applied to the first log n bits, then π to the next log n bits and so on.

Now assume that Eve manages to get her hands on the private key, encoded in whatever form she prefers. Even so, by the lower bound it follows that this is not enough to decode the message in constant time per log n bits. Her choices are now (a) to spend time t decoding each word for a total decoding time of n*t, (b) to bite the bullet and build a data structure of size n/t to assist in fast lookups for a total time of n*t+n/t > 2n, or (c) to obtain more data from the channel which by the lower bound is at least an additional n log n bits.

I don't see a natural interpretation of non-systematic representation.

A one-way permutation should be efficiently computable, so a time-t adversary should be able to get about t evaluations of the permutation, which you can model by having the permutation itself be available for lookup. Then (if you are trying to prove a lower bound for non-uniform algorithms in an oracle setting) the algorithm can have some non-uniform advice, which is the redundant part of the data structure. At this point, from Hellman and Yao it follows that the optimal tradeoff for inverting is time t and redundancy m provided m*t > N where N is the size of the range of the permutation.

Here is a question that has been open for 10+ years: suppose I want to determine the non-uniform complexity of inverting a random *function* rather than a permutation. Fiat and Naor show that a random function mapping [N] into [N] can be inverted in time t with redundancy m provided t* m^2 > N, up to polylogN terms. (Same if the function is not random, but it has the property that the preimage size is at most polylogN.)

And this paper shows that this trade-off (which has the interesting case t=m=N^{2/3}) is best possible under rather restrictive assumptions on what the redundant part of the data structure is allowed to contain, and how it is allowed to be used