Tag Archives: Additive Combinatorics

Post navigation

[At the end of a survey paper on additive combinatorics and computational complexity which is to appear in SIGACT News, I list three major open questions in additive combinatorics which might be amenable to a “computer science proof.” They are all extremely well studied questions, by very smart people, for the past several years, so they are all very long shots. I don’t recommend anybody to start working on them, but I think it is good that as many people as possible know about these questions, because when the right technique comes along its applicability can be more quickly realized.]

The first question is to improve the Triangle Removal Lemma. I have talked here about what the triangle removal lemma is, how one can prove it from the Szemerédi Regularity Lemma, and how it implies the length-3 case of Szemerédi’s Theorem.

As a short recap, the Triangle Removal Lemma states that if is an -vertex graph with triangles, then there is a set of edges such that the removal of those edges eliminates all the triangles. Equivalently, it says that if a graph has triangles which are all pair-wise edge-disjoint, then there must be triangles overall.

The connection with Szemerédi’s Theorem is that if is an abelian group with elements, and is a subset of with no length-3 arithmetic progressions (i.e., is such that there are no three distinct elements in such that ), then we can construct a graph that has vertices, pair-wise edge-disjoint triangles, and no other triangles. This contradicts the triangle removal lemma if , and so we must have .

This is great, until we start looking at the relationships between the constants hidden by the notation. Quantitatively, the Triangle Removal Lemma states that for every there is a such that if a graph has at least pair-wise edge-disjoint triangles, then it has at least triangles. The only known proof, however, has incredibly small: grows like a tower of exponentials of height polynomial in . The proof uses the Szemerédi Regularity Lemma, and the Regularity Lemma is known to require such very bad dependencies.

63 years ago, Behrend showed that , prime, has a subset that contains no length-3 arithmetic progression and whose size is . (Last year, Elkin gave the first improvement in 62 years to Behrend’s bound, but the improvement is only a multiplicative polylog factor.) Combined with the graph construction mentioned above, this gives a graph with vertices, edge-disjoint triangles, and no other triangle. Thus, the graph has triangles where , but one needs to remove edges to make it triangle-free, where . This shows that, in the Triangle Removal Lemma, must grow super-polynomially in , and be at least .

The question is to shorten the gap between the tower-of-exponential relationship between and coming from the proof via the Szemerédi Regularity Lemma and the mildly super-polynomial lower bound coming from the above argument.

I am writing a short survey on connections between additive combinatorics and computer science for SIGACT News and I have been wondering about the “history” of the connections. (I will be writing as little as possible about history in the SIGACT article, because I don’t have the time to research it carefully, but if readers help me out, I would write about it in a longer article that I plan to write sometime next year.)

Now, of course, there is always that Russian paper from the 1950s . . .

Then there is the Coppersmith-Winograd matrix multiplication algorithm, which uses Behrend’s construction of large sets of integers with no length-3 arithmetic progressions. (They don’t need the full strength of Behrend’s construction, and the weaker construction of Salem and Spencer suffices.) As far as I know, this was an isolated application.

My understanding (which is quite possibly mistaken) is that there are three main threads to the current interactions: one concerning the Szemeredi Regularity Lemma and the “triangle removal lemma” of Ruzsa and Szemeredi; one concerning “sum-product theorems” and the Kakeya problem; and one concerning the Gowers uniformity norms. The earliest connection is the first one, and, depending how one is counting, started in 1992 or 2001, and is due, respectively, to Alon et al. or just to Alon.Continue reading →

has low complexity relative to : there are functions and coefficients such that

is -indistinguishable from by , that is,

(Last time, I mentioned that our proof handled only boolean functions ; now we can handle arbitrary bounded functions, and with an “energy-decrease” style proof, this will appear in the next online revision of the paper.)

This seems to be a useful tool, limited only by one’s creativity in choosing the functions and then making use of the properties of .

As already discussed,

if one takes to be the edges of a complete graph, and the set of indicator variables of cuts, then the existence of gives the weak regularity lemma of Frieze and Kannan; and

if one takes to be the set of circuits of size at most , and normalizes and to be probability distributions, one gets that for every probability distribution of high entropy there is a (non-uniformly) efficiently samplable and computable distribution that is indistinguishable from by circuits of size .

In this post I’ll show how to use it to give short proofs of the Hardcore Lemma of Impagliazzo and the Dense Model Theorem of Green, Tao and Ziegler. Both proofs also have, at least in hindsight, a sense of “inevitability,” meaning that given the Low-Complexity Approximation Theorem, and given what we want to prove, both proofs get to the point in a most economical and natural way.

The Impagliazzo Hardcore Lemma. We have already mentioned that if is “hard-on-average” for , then cannot be an approximation in the sense of being close to on most inputs. What, then, about the points on which and differ? They form an Impagliazzo Hardcore Set for , as described next.

Let be a function that is weakly hard on average for a class of algorithms . Suppose, specifically, that for every algorithm of complexity relative to we have

and, more generally, for fractional , we have

Then, construct an approximating function of complexity relative to and such that and are -indistinguishable by . Note that, even though is “indistinguishable” from , it is also “far” from , as in (1).

Define a probability distribution that assigns to point a probability proportional to . (If were boolean, this would be the uniform distribution over points on which and differ.) Then this distribution is -dense in the uniform distribution, meaning that every point has probability at most . Observe also that we have

for every , because and have the same sign and , so we have

and so is a hardcore distribution, because the above expression is equivalent to

The Dense Model Theorem. Suppose that is a pseudorandom set with respect to functions that have bounded complexity relative to , and let be a dense subset of , .

To find a dense model of , we take to be the characteristic function of , and we let be the low-complexity approximation, but using the uniform distribution on as .. Now suppose for simplicity that is boolean, and that is the set of inputs of on which is 1. We want to argue that is a dense model of . By assuming without loss of generality that contains the all-one function, we get from the indistinguishability of and that

and from the pseudorandomness of we have

and so and is indeed dense.

For the indistinguishability of and , take any function , and observe that

where we use both the indistinguishability of and under distribution , and the fact that the distributions and are indistinguishable by functions of bounded complexity.

This proof is appealingly intuitive, in the sense that if we expect to be indistinguishable from a large set, then when we try to approximate the characteristic function of we will end up with a low complexity function that is spread around much of , thus defining a dense model. It also shows that “relative” versions of the Regularity Lemma, such as the Regularity Lemma for subgraphs of an expander, may be derived from the regular Lemma by the above argument. A disadvantage of the argument is that it does not establish the stronger form of the Dense Model Theorem suggested by Impagliazzo, in which there is no set , but we require to have the “pseudo-density” requirement that for every low-complexity bounded function ,

which follows immediately if has density in a pseudorandom set , but that is a seemingly weaker property. (The relative Regularity Lemma in graphs had long be known to hold under such a pseudo-density assumption.)

In a previous post, I described abstract forms of the weak regularity lemma, in which we start from an arbitrary bounded function and an arbitrary set of “structured” bounded functions , and we construct an “approximatng” function that has “low complexity” relative to and is “indistinguishable” from by functions from . We had two proofs, both generalizations of arguments due to Frieze and Kannan: in one proof is not bounded, but its complexity is , where is the approximation parameter; in the second proof, is bounded but it has exponential complexity in the approximation parameter.

In a new paper by Madhur Tulsiani, Salil Vadhan, and I, we give a new proof that gives an approximating function that, at the same time, is bounded and has low complexity. This has a number of applications, which I will describe in a future post.

(Note that, in the statement below, is required to contain Boolean functions, rather than bounded ones. It is possible, however, to “reduce” the bounded case to the boolean case via the following observation: if is a family of bounded functions, is the family of boolean functions , and and are -indistinguishable according to , then they are also -indistinguishable according to .)

has low complexity relative to : there are functions and coefficients such that

is -indistinguishable from by , that is,

That is, is simply a linear combination of functions from , whose value is truncated to be between and .

Recall that, when we do not require to be bounded, can be constructed via the following algorithm:

Algorithm FK

0;

while such that

return

And the analysis shows that, if we call , then . Setting shows that the algorithm stops within steps.

Our proof proceeds by doing exactly the same, but making sure at every step that is bounded.

Algorithm TTV

0;

while such that

return

The problem with the old analysis is that now we could conceivably have a step in which for all , and so , and thus , and we have no energy decrease. To get to such a state, however, there must have been about steps in which changed in the same way as in the Frieze-Kannan case. I think there should be a sort of “amortized analysis” that would “charge” the lack of change of at certain steps to other steps in which it indeed changes, and establish that, for every time step ,

Unfortunately I don’t know how to construct such a proof. Our proof, instead, follows step by step Impagliazzo’s proof of the Impagliazzo Hardcore Set Lemma, which employs a very similar algorithm (in a quite different context). As shown by Klivans and Servedio, the algorithm in Impagliazzo’s proof can be seen as a boosting algorithm in learning theory (and every other known boosting algorithm can be used to prove Impagliazzo’s Hardcore Lemma); this is why we think of our proof as a “boosting” proof.

I must confess, I have never really understood Impagliazzo’s proof, and so I can’t say I really understand ours, other than being able to reproduce the steps.

The idea is to consider the quantity , which depends on and , and to see how it behaves summed over , for a fixed , and how it behaves for a fixed , summed over .

Suppose the algorithm is still running after steps. Then, because of the way we define the termination condition, we have, for every

and the crux of the proof is to show that for every we have

So if we sum (1) over and average (2) over , we get

and setting gives .

Inequality (2), the main step of the proof, is the part for which I have little intuition. One breaks the summation into groups of time steps, depending on the value of ; there are groups (because the value of changes by discrete increments of , and is between and ) and each one is shown to contribute , where is the number of time steps in the group.

It is perhaps instructive to translate the Frieze-Kannan proof to this set-up. In the Frieze-Kannan algorithm, we have

Several results in additive combinatorics have the flavor of decomposition results in which one shows that an arbitrary combinatorial object can be approximated by a “sum” of a “low complexity” part and a “pseudorandom” part. This is helpful because it reduces the task of proving that an arbitrary object has a certain property to the easier task of proving that low-complexity objects and pseudorandom objects have the property. (Provided that the property is preserved by the approximation and the sum.)

Usually, one needs a strong notion of approximation, and in such cases the complexity of the “low complexity” part is fantastically large (a tower of exponentials in the approximation parameter), as in the Szemeredi Regularity Lemma, which can be seen as a prototypical such “decomposition” result. In some cases, however, weaker notions of approximation are still useful, and one can have more reasonable complexity parameters, as in the Weak Regularity Lemma of Frieze and Kannan.

A toy example of a “weak decomposition result” is the following observation: Continue reading →

One of the generally exciting things about additive combinatorics is the confluence of techniques from analysis, combinatorics, and ergodic theory. One of the specifically exciting things to me has been the applicability of the analytic and the combinatorial techniques to computer science. It makes sense that we should make some effort to understand the ergodic-theoretic techniques too, just in case.

This semester, the MSRI is having a special semester devoted to additive combinatorics and ergodic theory.

Additive combinatorics is the branch of extremal combinatorics where the objects of interest are sets of integers and, more generally, subsets of abelian groups (rather than graphs or set systems) and where the properties of interest are formulated in terms of linear equations (rather than in terms of cuts, partitions, intersections, and so on). Lately, it has been quite fascinating for a convergence of methods from “classical” combinatorics, from analysis and from ergodic theory. I have often written about it here because the combinatorial and analytical techniques of additive combinatorics have been useful in a number of different computer science applications (related to probabilistically checkable proof constructions, property testing, and pseudorandomness), and computer scientists are becoming more and more interested in the subject, contributing their own perspective to it.

In all this, the exact role of ergodic theory (and the possible applications of ergodic-theoretic techniques in complexity theory) has been a bit of mystery to me, and perhaps to other computer scientists too. Very nicely, the MSRI special program started this week with a series of tutorials to introduce the connections between ergodic theory and additive combinatorics.

All talks are (or will soon be) online, and the talks by Terry Tao are especially recommended, because he explains, using very simple examples, how one goes about converting concrete, finitary and quantitative statements into equivalent abstract infinitary statements, which are in turn amenable to ergodic-theoretic techniques.

Today, Tamar Ziegler discussed the very recent proof, by Vitaly Bergelson, Terry Tao, and herself, of the inverse conjecture for Gowers norms in finite fields, a proof that uses ergodic-theoretic techniques.

But, you may object, didn’t we know that the inverse conjecture for Gowers norms is false? Well, the counterexample of Lovett, Meshulam, and Samorodnitsky (independently discovered by Green and Tao) refutes the following conjecture: “Suppose is a bounded function such that ; then there is a independent of and an -variate degree-(k-1) polynomial over such that and have correlation at least .”

This is refuted in the case by taking , where is the symmetric polynomial of degree 4. While has (indeed, exponentially small) correlation with all functions of the form , where is a degree-3 polynomial, the norm is a constant.

The statement proved by Bergelson, Tao, and Ziegler is “Suppose is a bounded function such that ; then there is a independent of and a bounded -variate degree-(k-1) ‘polynomial function’ such that and have correlation at least .”

What is, then, a ‘polynomial function’ of degree ? It is a function bounded by 1 in absolute value, and such that for every directions , if one takes Gowers derivatives in such directions one always gets the constant-1 function. In other words, is a ‘polynomial function’ of degree if for every , and one has . Interestingly, these functions are a proper superclass of the functions of the form with being a polynomial over .

In the concrete case, one may construct such a function by letting be a polynomial in the ring, say, , and then having , where is a primitive eight-root of unity. Indeed, this is the type of degree-3 polynomial function that is correlated with .

[Apologies for not defining all the technical terms and the context; the reader can find some background in this post and following the links there.]

What is, then, ergodic theory, and what does it have to do with finitary combinatorial problems? I am certainly the wrong person to ask, but I shall try to explain the little that I have understood in the next post(s).

Green, Tao and Ziegler, in theirworks on patterns in the primes, prove a general result of the following form: if is a set, is a, possibly very sparse, “pseudorandom” susbset of , and is a dense subset of , then may be “modeled” by a large set which has the same density in as the density of in .

They use this result with being the integers in a large interval , being the “almost-primes” in (integers with no small factor), and being the primes in . Since the almost-primes can be proved to be “pseudorandom” in a fairly strong sense, and since the density of the primes in the almost-primes is at least an absolute constant, it follows that the primes are “indistinguishable” from a large set containing a constant fraction of all integers. Since such large sets are known to contain arbitrarily long arithmetic progressions, as proved by Szemeredi, Green and Tao are able to prove that the primes too must contain arbitrarily long arithmetic progressions. Such large sets are also known to contain arbitrarily long “polynomial progressions,” as proved by Bergelson and Leibman, and this allows Tao and Ziegler to argue that the primes too much contain arbitrarily long polynomial progressions.

(The above account is not completely accurate, but it is not lying too much.)

As announced last October here, and here, Omer Reingold, Madhur Tulsiani, Salil Vadhan and I found a new proof of this “dense model” theorem, which uses the min-max theorem of game theory (or, depending on the language that you prefer to use, the duality of linear programming or the Hahn Banach theorem) and was inspired by Nisan’s proof of the Impagliazzo hard-core set theorem. In complexity-theoretic applications of the theorem, our reduction has polynomial complexity, while the previous work incurred an exponential loss.

As discussed here and here, we also show how to use the Green-Tao-Ziegler techniques of “iterative partitioning” to give a different proof of Impagliazzo’s theorem.

After long procrastination, we recently wrote up a paper about these results.

In the Fall, we received some feedback from additive combinatorialists that while our proof of the Green-Tao-Ziegler result was technically simpler than the original one, the language we used was hard to follow. (That’s easy to believe, because it took us a while to understand the language in which the original proof was written.) We then wrote an expository note of the proof in the analyst’s language. When we were about to release the paper and the note, we were contacted by Tim Gowers, who, last Summer, had independently discovered a proof of the Green-Tao-Ziegler results via the Hahn-Banach theorem, essentially with the same argument. (He also found other applications of the technique in additive combinatorics. The issue of polynomial complexity, which does not arise in his applications, is not considered.)

Gowers announced his results in April at a talk at the Fields institute in Toronto. (Audio and slides are available online.)

Gowers’ paper already contains the proof presented in the “analyst language,” making our expository note not so useful any more; we have still posted it anyways because, by explaining how one translates from one notation to the other, it can be a short starting point for the computer scientist who is interested in trying to read Gowers’ paper, or for the combinatorialist who is interested in trying to read our paper.

Thm1: Every constant-density subset of a pseudorandom set of integers contains arbitrarily long arithmetic progressions.

Thm2: The primes have constant density inside a pseudorandom set.

Of those, the main contribution of the paper is the first theorem, a “relative” version of Szemeredi’s theorem. In turn, its proof can be (even more inaccurately) broken up as

Thm 1.1: For every constant density subset D of a pseudorandom set there is a “model” set M that has constant density among the integers and is indistinguishable from D.

Thm 1.2 (Szemeredi) Every constant density subset of the integers contains arbitrarily long arithmetic progressions, and many of them.

Thm 1.3 A set with many long arithmetic progressions cannot be indistinguishable from a set with none.

Following this scheme is, of course, easier said than done. One wants to work with a definition of pseudorandomness that is weak enough that (2) is provable, but strong enough that the notion of indistinguishability implied by (1.1) is in turn strong enough that (1.3) holds. From now on I will focus on (1.1), which is a key step in the proof, though not the hardest.

Recently, Tao and Ziegler proved that the primes contain arbitrarily long “polynomial progressions” (progressions where the increments are given by polynomials rather than linear functions, as in the case of arithmetic progressions). Their paper contains a very clean formulation of (1.1), which I will now (accurately, this time) describe. (It is Theorem 7.1 in the paper. The language I use below is very different but equivalent.)

We fix a finite universe ; this could be in complexity-theoretic applications or in number-theoretic applications. Instead of working with subsets of , it will be more convenient to refer to probability distributions over ; if is a set, then is the uniform distribution over . We also fix a family of “easy” function . In a complexity-theoretic applications, this could be the set of boolean functions computed by circuits of bounded size. We think of two distributions as being -indistinguishable according to if for every function we have

and we think of a distribution as pseudorandom if it is indistinguishable from the uniform distribution . (This is all standard in cryptography and complexity theory.)

Now let’s define the natural analog of “dense subset” for distributions. We say that a distribution is -dense in if for every we have

Note that if and for some sets , then is -dense in if and only if and .

So we want to prove the following:

Theorem (Green, Tao, Ziegler)Fix a family of tests and an ; then there is a “slightly larger” family and an such that if is an -pseudorandom distribution according to and is -dense in , then there is a distribution that is -dense in and that is -indistinguishable from according to .

[The reader may want to go back to (1.1) and check that this is a meaningful formalization of it, up to working with arbitrary distributions rather than sets. This is in fact the “inaccuracy” that I referred to above.]

In a complexity-theoretic setting, we would like to say that if is defined as all functions computable by circuits of size at most , then should be and should contain only functions computable by circuits of size . Unfortunately, if one follows the proof and makes some simplifications asuming contains only boolean functions, one sees that contains functions of the form , where , , and could be arbitrary and, in general, have circuit complexity exponential in and . Alternatively one may approximate as a low-degree polynomial and take the “most distinguishing monomial.” This will give a version of the Theorem (which leads to the actual statement of Thm 7.1 in the Tao-Ziegler paper) where contains only functions of the form , but then will be exponentially small in and . This means that one cannot apply the theorem to “cryptographically strong” notions of pseudorandomness and indistinguishability, and in general to any setting where and are super-logarithmic (not to mention super-linear).

This seems like an unavoidable consequence of the “finitary ergodic theoretic” technique of iterative partitioning and energy increment used in the proof, which always yields at least a singly exponential complexity.

Omer Reingold, Madhur Tulsiani, Salil Vadhan and I have recently come up with a different proof where both and the complexity of are polynomial. This gives, for example, a new characterization of the notion of pseudoentropy. Our proof is quite in the spirit of Nisan’s proof of Impagliazzo’s hard-core set theorem, and it is relatively simple. We can also deduce a version of the theorem where, as in Green-Tao-Ziegler, contains only bounded products of functions in . In doing so, however, we too incur an exponential loss, but the proof is somewhat simpler and demonstrates the applicability of complexity-theoretic techniques in arithmetic combinatorics.

Since we can use (ideas from) a proof of the hard core set theorem to prove the Green-Tao-Ziegler result, one may wonder whether one can use the “finitary ergodic theory” techniques of iterative partitioning and energy increment to prove the hard-core set theorem. Indeed, we do this too. In our proof, the reduction loses a factor that is exponential in certain parameters (while other proofs are polynomial), but one also gets a more “constructive” result.

If readers can stomach it, a forthcoming post will describe the complexity-theory-style proof of the Green-Tao-Ziegler result as well as the ergodic-theory-style proof of the Impagliazzo hard core set theorem.

Back in August, Boaz Barak and Moses Charikar organized a two-day course on additive combinatorics for computer scientists in Princeton. Boaz and Avi Wigderson spoke on sum-product theorems and their applications, and I spoke on techniques in the proofs of Szemeredi’s theorem and their applications. As an Australian model might say, that’s interesting!

Videos of the talks are now online. The quality of the audio and video is quite good, you’ll have to decide for yourself on the quality of the lectures. The schedule of the event was grueling, and in my last two lectures (on Gowers uniformity and applications) I am not very lucid. In earlier lectures, however, I am merely sleep deprived — I can be seen falling asleep in front of the board a few times. Boaz’s and Avi’s lectures, however, are flawless.