254B, Notes 3: Linear patterns

In the previous lecture notes, we used (linear) Fourier analysis to control the number of three-term arithmetic progressions in a given set . The power of the Fourier transform for this problem ultimately stemmed from the identity

for any cyclic group and any subset of that group (analogues of this identity also exist for other finite abelian groups, and to a lesser extent to non-abelian groups also, although that is not the focus of my current discussion). As it turns out, linear Fourier analysis is not able to discern higher order patterns, such as arithmetic progressions of length four; we give some demonstrations of this below the fold, taking advantage of the polynomial recurrence theory from Notes 1.

The main objective of this course is to introduce the (still nascent) theory of higher order Fourier analysis, which is capable of studying higher order patterns. The full theory is still rather complicated (at least, at our present level of understanding). However, one aspect of the theory is relatively simple, namely that we can largely reduce the study of arbitrary additive patterns to the study of a single type of additive pattern, namely the parallelopipeds

Thus for instance, for one has the line segments

for one has the parallelograms

for one has the parallelopipeds

These patterns are particularly pleasant to handle, thanks to the large number of symmetries available on the discrete cube . For instance, whereas establishing the presence of arbitrarily long arithmetic progressions in dense sets is quite difficult (Szemerédi’s theorem), establishing arbitrarily high-dimensional parallelopipeds is much easier:

Exercise 1 Let be such that for some . If is sufficiently large depending on , show that there exists an integer such that . (Hint: obtain upper and lower bounds on the set .)

Exercise 2 (Hilbert cube lemma) Let be such that for some , and let be an integer. Show that if is sufficiently large depending on , then contains a parallelopiped of the form (2), with positive integers. (Hint: use the previous exercise and induction.) Conclude that if has positive upper density, then it contains infinitely many such parallelopipeds for each .

Exercise 3 Show that if is an integer, and is sufficiently large depending on , then for any parallelopiped (2) in the integers , there exists , not all zero, such that . (Hint: pigeonhole the in the residue classes modulo .) Use this to conclude that if is the set of all integers such that for all integers , then is a set of positive upper density (and also positive lower density) which does not contain any infinite parallelopipeds (thus one cannot take in the Hilbert cube lemma).

The standard way to control the parallelogram patterns (and thus, all other (finite complexity) linear patterns) are the Gowers uniformity norms

with a function on a finite abelian group , and is the complex conjugation operator; analogues of this norm also exist for group-like objects such as the progression , and also for measure-preserving systems (where they are known as the Gowers-Host-Kra uniformity seminorms, see this paper of Host-Kra for more discussion). In this set of notes we will focus on the basic properties of these norms; the deepest fact about them, known as the inverse conjecture for these norms, will be discussed in later notes.

— 1. Linear Fourier analysis does not control length four progressions —

Let be a subset of a cyclic group with density ; we think of as being fixed, and as being very large or goingn off to infinity.

For each , consider the number

of -term arithmetic progressions in (including degenerate progressions). Heuristically, this expression should typically be close to . Since there are pairs and we would expect each pair to have a “probability” that simultaneously lie in . Indeed, using standard probabilistic tools such as Chernoff’s inequality, it is not difficult to justify this heuristic with probability asymptotically close to in the case that is a randomly chosen set of the given density.

Let’s see how this heuristic holds up for small values of . For , this prediction is exactly accurate (with no error term) for any set with cardinality ; no randomness hypothesis of any sort is required. For , we see from (1) and the observaation that that (7) is given by the formula

Let us informally say that is Fourier-pseudorandom if one has

where is a quantity that goes to zero as . Then from applying Plancherel’s formula and Cauchy-Schwarz as in the previous lecture notes, we see that the number of three-term arithmetic progressions is

Thus we see that the Fourier-pseudorandomness hypothesis allows us to count three-term arithmetic progressions almost exactly.

On the other hand, without the Fourier-pseudorandomness hypothesis, the count (7) can be significantly different from . For instance, if is an interval , then it is not hard to see that (7) is comparable to rather than ; the point is that with a set as structured as an interval, once and lie in , there is already a very strong chance that lies in also. In the other direction, a construction of Behrend (mentioned in the previous notes) shows the quantity (7) can in fact dip below for any fixed (and in fact one can be as small as for some absolute constant ).

Now we consider the case of (7), which counts four-term progressions. Here, it turns out that Fourier-pseudorandomness is insufficient; it is possible for the quantity (7) to be significantly larger or smaller than even if is pseudorandom, as was observed by Gowers (with a closely related observation in the context of ergodic theory by Furstenberg).

Exercise 4 Let be an irrational real number, let , and let . Show that is Fourier-pseudorandom (keeping and fixed and letting ). Hint: One can use Exercise 21 from Notes 1 to show that sums of the form cannot be large.

Exercise 5 Continuing the previous exercise, show that the expression (7) for is equal to as , for some absolute constant , if is sufficiently small. (Hint: first show, using the machinery in Notes 1, that the two-dimensional sequence is asymptotically equidistributed in the torus .)

The above exercises show that a Fourier-pseudorandom set can have a four-term progression count (7) significantly larger than . One can also make the count significantly smaller than (another observation of Gowers), but this requires more work.

Exercise 6 Let . Show that there exists a function with for all , such that the expression

is strictly less than , where is the subspace of quadruplets such that is in arithmetic progression (i.e. for some ) and the obey the constraint

Hint: Take of the form

where is a small number, and are carefully chosen to make the term in (8) negative.

Exercise 7 Show that there exists an absolute constant such that for all sufficiently small and sufficiently large (depending on ) and a set with , such that (7) with is less than . (Hint: take for some , and let be a random subset of with each element of lying in with an independent probability of

where is the function in the previous exercise (with ), and are real numbers which are linearly independent over modulo .)

Now we consider the question of counting more general linear (or affine) patterns than arithmetic progressions. A reasonably general setting is to count patterns of the form

in a subset of a finite abelian group (e.g. a cyclic group ), where , and the are affine-linear forms

for some fixed integers and group elements . To avoid degeneracies, we will assume that all the are surjective (or equivalently, that the do not have a common factor that divides the order of ). This count would then be given by

where is the -linear form

For instance, the task of counting arithmetic progressions corresponds to the case , and .

We have the trivial bound

where

Remark 1 One can replace the norm on in (9) with an norm for various values of . The set of all admissible is described by the Brascamp-Lieb inequality, see this paper for further discussion. We will not need these variants of (9).

Improving this trivial bound turns out to be a key step in the theory of counting general linear patterns. In particular, it turns out that for any , one usually has

except when take a very special form (or at least correlate with functions of a very special form, such as linear or higher order characters).

To reiterate: the key to the subject is to understand the inverse problem of characterising those functions for which one has

This problem is of most interest (and the most difficult) in the “ world” when is small (e.g. ), but it is also instructive to consider the simpler cases of the “ world” when is very close to one (e.g. ), or the “ world” when is exactly equal to one. In these model cases one can use additional techniques (error-correction and similar techniques (often of a theoretical computer science flavour) in the world, or exact algebraic manipulation in the world) to understand this expression.

Let us thus begin with analysing the situation. Specifically, we assume that we are given functions with

and wish to classify the functions as best we can. We will normalise all the norms on the right-hand side to be one, thus for all and , and

By the triangle inequality, we conclude that

On the other hand, we have the crude bound

Thus equality occurs, which (by the surjectivity hypothesis on all the ) shows that for all and . Thus we may write for some phase functions . We then have

So the problem now reduces to the algebraic problem of solving functional equations such as (11). To illustrate this type of problem, let us consider a simple case when and

in which case we are trying to understand solutions to the functional equation

This equation involves three unknown functions . But we can eliminate two of the functions by taking discrete derivatives. To motivate this idea, let us temporarily assume that is the real line rather than a finite group, and that the functions are smooth. If we then apply the partial derivative operator to the above functional equation, one eliminates and obtains

applying then eliminates and leaves us with

thus vanishes identically; we can integrate this twice to conclude that is a linear function of its input,

for some constants . A similar argument (using the partial derivative operator to eliminate , or by applying change of variables such as ) shows that and for some additional constants . Finally, by returning to (12) and comparing coefficients we obtain the additional compatibility condition , which one then easily verifies to completely describe all possible solutions to this equation in the case of smooth functions on .

Returning now to the discrete world, we mimic the continuous operation of a partial derivative by introducing difference operators

for . If we difference (12) in the variable by an arbitrary shift by replacing by and then subtracting, we eliminate and obtain

if we then difference in the variable by a second arbitrary shift , one obtains

for all ; in particular, for all . Such functions are affine-linear:

Exercise 8 Let be a function. Show that if and only if one has for some and some homomorphism . Conclude that the solutions to (12) are given by the form , where and are homomorphisms with .

Having solved the functional equation (12), let us now look at an equation related to four term arithmetic progressions, namely

for all , some constant , and some functions . We will try to isolate by using discrete derivatives as before to eliminate the other functions. Firstly, we differentiate in the direction by an arbitrary shift , leading to

In preparation for then eliminating , we shift backwards by , obtaining

Differentiating in the direction by another arbitrary shift , we obtain

We shift backwards by again:

One final differentiation in by an arbitrary shift gives

For simplicity, we now make the assumption that the order of is not divisible by either or , so that the homomorphisms and are automorphisms of . We conclude that

for all . Such functions will be called quadratic functions from to , thus is quadratic. A similar argument shows that are quadratic.

Just as (affine-)linear functions can be completely described in terms of homomorphisms, quadratic functions can be described in terms of bilinear forms, as long as one avoids the characteristic case:

Exercise 9 Let be a finite abelian group with not divisible by . Show that a map is quadratic if and only one has a representation of the form

where , is a homomorphism, and is a symmetric bihomomorphism (i.e. , and is a homomorphism in each of individually (holding the other variable fixed)). (Hint: Heuristically, one should set , but there is a difficulty because the operation of dividing by is not well-defined on . It is, however, well-defined on roots of unity, thanks to not being divisible by two. Once has been constructed, subtract it off and use Exercise 8.) What goes wrong when is divisible by ?

Exercise 10 Show that when is not divisible by , that the complete solution to (13) is given by

for , , homomorphisms , and symmetric bihomomorphisms with and .

Exercise 11 Obtain a complete solution to the functional equation (13) in the case when is allowed to be divisible by or . (This is an open-ended and surprisingly tricky exercise; it of course depends on what one is willing to call a “solution” to the problem. Use your own judgement.)

Exercise 12 Call a map a polynomial of degree if one has for all . Show that if and obey the functional equation

and is not divisible by any integer between and , then are polynomials of degree .

We are now ready to turn to the general case of solving equations of the form (11). We relied on two main tricks to solve these equations: differentiation, and change of variables. When solving an equation such as (13), we alternated these two tricks in turn. To handle the general case, it is more convenient to rearrange the argument by doing all the change of variables in advance. For instance, another way to solve (13) is to first make the (non-injective) change of variables

for all . The point of performing this change of variables is that while the term (for instance) involves all the three variables , the remaining terms only depend on two of the at a time. If we now pick arbitrarily, and then differentiate in the variables by the shifts respectively, then we eliminate the terms and arrive at

which soon places us back at (14) (assuming as before that is not divisible by or ).

Definition 1 (Cauchy-Schwarz complexity) A system of affine-linear forms (with linear coefficients in ) have Cauchy-Schwarz complexity at most if, for every , one can partition into classes (some of which may be empty), such that does not lie in the affine-linear span (over ) of the forms in any of these classes. The Cauchy-Schwarz complexity of a system is defined to be the least such with this property, or if no such exists.

The adjective “Cauchy-Schwarz” (introduced by Gowers and Wolf) may be puzzling at present, but will be motivated later.

This is a somewhat strange definition to come to grips with at first, so we illustrate it with some examples. The system of forms is of complexity ; given any form here, such as , one can partition the remaining forms into two classes, namely and , such that is not in the affine-linear span of either. On the other hand, as is in the affine linear span of , the Cauchy-Schwarz complexity is not zero.

Exercise 13 Show that for any , the system of forms has complexity .

Exercise 14 Show that a system of non-constant forms has finite Cauchy-Schwarz complexity if and only if no form is an affine-linear combination of another.

There is an equivalent way to formulate the notion of Cauchy-Schwarz complexity, in the spirit of the change of variables mentioned earlier. Define the characteristic of a finite abelian group to be the least order of a non-identity element.

Proposition 2 (Equivalent formulation of Cauchy-Schwarz complexity) Let be a system of affine-linear forms. Suppose that the characteristic of is sufficiently large depending on the coefficients of . Then has Cauchy-Schwarz complexity at most if and only if, for each , one can find a linear change of variables over such that the form has non-zero coefficients, but all the other forms with have at least one vanishing coefficient, and is the linear form induced by the integer coefficients of .

Proof: To show the “only if” part, observe that if and is as above, then we can partition the , into classes depending on which coefficient vanishes for (breaking ties arbitrarily), and then is not representable as an affine-linear combination of the forms from any of these classes (here we use the large characteristic hypothesis). Conversely, suppose has Cauchy-Schwarz complexity at most , and let . We can then partition the into classes , such that cannot be expressed as an affine-linear combination of the from for any . By duality, one can then find vectors for each such that does not annihilate , but all the from do. If we then set

then we obtain the claim.

Exercise 15 Let be a system of affine-linear forms with Cauchy-Schwarz complexity at most , and suppose that the equation (11) holds for some finite abelian group and some . Suppose also that the characteristic of is sufficiently large depending on the coefficients of . Conclude that all of the are polynomials of degree .

It turns out that this result is not quite best possible. Define the true complexity of a system of affine-linear forms to be the largest such that the powers are linearly independent over .

Exercise 16 Show that the true complexity is always less than or equal to the Cauchy-Schwarz complexity, and give an example to show that strict inequality can occur. Also, show that the true complexity is finite if and only if the Cauchy-Schwarz complexity is finite.

Exercise 17 Show that Exercise 15 continues to hold if Cauchy-Schwarz complexity is replaced by true complexity. (Hint: first understand the cyclic case , and use Exercise 15 to reduce to the case when all the are polynomials of bounded degree. The main point is to use a “Lefschetz principle” to lift statements in to a characteristic zero field such as .) Show that the true complexity cannot be replaced by any smaller quantity.

In the previous section, we saw that equality in the trivial inequality (9) only occurred when the functions were of the form for some polynomials of degree at most , where was the true complexity (or Cauchy-Schwarz complexity) of the system . Another way of phrasing this latter fact is that one has the identity

for all , where is the multiplicative derivative

This phenomenon extends beyond the “ world” of exact equalities. For any and , we define the Gowers norm by the formula

Exercise 18 (Fourier representation of ) Define the Pontryagin dual of a finite abelian group to be the space of all homomorphisms . For each function , define the Fourier transform by the formula . Establish the identity

In particular, the norm is a genuine norm (thanks to the norm properties of , and the injectivity of the Fourier transform).

For the higher Gowers norms, there is not nearly as nice a formula known in terms of things like the Fourier transform, and it is not immediately obvious that these are indeed norms. But this can be established by introducing the more general Gowers inner product

for any -tuple of functions , thus in particular

The relationship between the Gowers inner product and the Gowers uniformity norm is analogous to that between a Hilbert space inner product and the Hilbert space norm. In particular, we have the following analogue of the Cauchy-Schwarz inequality:

Exercise 19 (Cauchy-Schwarz-Gowers inequality) For any tuple of functions , use the Cauchy-Schwarz inequality to show that

for all , where for and , is formed from by replacing the coordinate with . Iterate this to conclude that

Then use this to conclude the monotonicity formula

for all , and the triangle inequality

for all . (Hint: For the latter inequality, raise both sides to the power and expand the left-hand side.) Conclude in particular that the norms are indeed norms for all .

The Gowers uniformity norms can be viewed as a quantitative measure of how well a given function behaves like a polynomial. One piece of evidence in this direction is:

Exercise 20 (Inverse conjecture for the Gowers norm, case) Let be such that , and let . Show that , with equality if and only if for some polynomial of degree at most .

The problem of classifying smaller values of is significantly more difficult, and will be discussed in later notes.

Exercise 21 (Polynomial phase invariance) If is a function and is a polynomial of degree at most , show that . Conclude in particular that

where ranges over polynomials of degree at most .

The main utility for the Gowers norms in this subject comes from the fact that they control many other expressions of interest. Here is a basic example:

Exercise 22 Let be a function, and for each , let be a function bounded in magnitude by which is independent of the coordinate of . Let be non-zero integers, and suppose that the characteristic of exceeds the magnitude of any of the . Show that

Exercise 23 (Generalised von Neumann inequality) Let be a collection of affine-linear forms with Cauchy-Schwarz complexity . If the characteristic of is sufficiently large depending on the linear coefficients of , show that one has the bound

whenever are bounded in magnitude by one.

Conclude in particular that if is a subset of with , then

From the above inequality, we see that if has some positive density but has much fewer than (say) patterns of the form with , then we have

This is the initial motivation for studying inverse theorems for the Gowers norms, which give necessary conditions for a (bounded) function to have large norm. This will be a focus of subsequent notes.

Not sure if this is the right place to post this, but I was browsing arxiv trackbacks and noticed that for your posts that link arxiv articles, in addition to the trackbacks expected, wordpress is also sending trackbacks for Thurston’s article linked in the sidebar.

[…] regularity lemma, Gowers uniformity norms, polynomials, Szemeredi's theorem | by Terence Tao In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled […]

[…] arguments let one conclude that are affine homomorphisms; see e.g. Section 2 of these lecture notes. It turns out that essentially the same argument can be applied in the nonabelian case, but one […]

Dear professor Tao,
I am having struggles proving exercise 22. It seems as if I have found a counterexample. I looked at the group , which has characteristic 3. For , take $a=2$ and the statement then is . But in this case one can just take a function which is nonzero on but 0 on the rest.

I am also a little bit confused about your definition of affine linear form . You define it to be maps of the form for integers , but in a group, what does mean? If were a ring with unit, I would understand.

Hi prof. Tao, I find the proof of proposition 2 (equivalent formulation of CS-complexity) hard to follow (it could be me of course). I don’t understand how you use the large characteristic hypothesis to proof the “only if” part. Could you please explain it in more detail?

Fix . By hypothesis, the rational form has non-vanishing coefficient. If the characteristic is large enough that it does not divide the numerator of this coefficient, this implies that the G-form also has non-vanishing coefficient. On the other hand, all the forms in the class associated to have vanishing coefficient. Hence cannot be an affine-linear combination of these .

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.