February 5, 2011

Polymath6: improving the bounds for Roth’s theorem

For the time being this is an almost empty post, the main purpose of which is to provide a space for mathematical comments connected with the project of assessing whether it is possible to use the recent ideas of Sanders and of Bateman and Katz to break the barrier in Roth’s theorem. (In a few hours’ time I plan to write a brief explanation of what one of the main difficulties seems to be.)

Added later. Tom Sanders made the following remarks as a comment. It seems to me to make more sense to have them as a post, since they are a good starting point for a discussion. So I have taken the liberty of upgrading the comment. Thus, the remainder of this post is written by Tom.

This will hopefully be an informal post on one aspect of what we might need to do to translate the Bateman-Katz work into the setting.

One of the first steps in the Bateman-Katz argument is to note that if is a cap-set (meaning it is free of three-term progressions) of density then we can assume that there are no large Fourier coefficients, meaning

.

They use this to develop structural information about the large spectrum, , which consequently has size between and . This structural information is then carefully analysed in the `beef’ of the paper.

To make the assumption there are no large Fourier coefficients they proceed via the usual Meshulam argument: if there is a large coefficient then we get a density increment of the form on a subspace of co-dimension and this can be iterated until they have all been removed. In the setting this has to be `Bourgainised’.

We proceed relative to Bohr sets rather than subspace. The problem with Bohr sets is that they do not function exactly as subspaces and, in particular, they do not have a nice additive closure property. A good model to think of is the unit cube in . The sumset is not roughly equal to ; it is about times as large as . However, if we take some small dilate of , say the cube of side length then we do have that since . This provide a sort of approximate additive closure, and the fact that it can be usefully extended to Bohr sets and used for Roth’s theorem was usefully notice by Bourgain in his paper `On triples in arithmetic progression‘.

In our situation, if is a Bohr set of dimension , and has relative density then we shall try to remove all characters such that . Given such a character we produce a new Bohr set defined to be the intersection of (dilated by a factor of ) and the -approximate level set of (meaning the set of such that ) with

and has density on a translate of . After running this for at most iterations we end up with a Bohr set such that

.

However, the only lower bound we have on the size of a Bohr set in a general Abelian group is , which means we have to take or else our Bohr sets will become too small. Of course, in the width plays (essentially) no role in determining the size of the Bohr set and we have and we can take as desired for the Bateman-Katz analysis.

Having seen this weakness of `Bourgainisation’ one naturally wants to look for arguments which somehow involve iterating a smaller number of times: if we had been able to take many Fourier coefficients together each time we passed to a new Bohr set we would not have had to iterate, and therefore narrow, the Bohr set so many times. In fact Heath-Brown and Szemeredi in their papers `Integer sets containing no arithmetic progressions‘ and `Integer sets containing no arithmetic progressions‘ provided such.

The key idea of the Heath-Brown-Szemeredi approach in the Bohr set context is to intersect the dilate of the Bohr set with the -approximate level set of all the characters in the large spectrum . This set has size at most by Parseval’s theorem and so we get a Bohr set with

.

However, in this case we end up with a much bigger density increment. Indeed, for all from which we essentially get that

.

This translates to a density increment of and such an increment can only be iterated times — that is to say not very many times. Unfortunately even when combined with Chang’s theorem this does not give an improvement over Bourgain’s original argument and it wasn’t until 2008 that Bourgain produced a new argument improving our understanding in `Roth’s theorem on progressions revisited‘.

In this sequel a more careful analysis of the large spectrum is produced and this benefits from knowing whether or not contains most of the Fourier mass in or not. The point here is that we are given by the usual Fourier arguments that supports a large Fourier mass. Now, if is somewhat bigger than then it is a stronger statement to say that contains most of the Fourier mass. If it does then our plan might be to run one of the known Roth arguments; if it doesn’t then is large and we can hope to run the Bateman-Katz argument.

Hopefully I’ll talk more about Bourgain’s method which gives (and a slight refinement which gives ) because these along with Bourgain’s original approach can all make use of the fact that is large, rather than simply the statement that has no non-trivial three-APs (which is stronger). One of the problems we face is that naively the proof giving cannot make use of the fact that is large unless this can be converted into a meaningful physical space statement.

I did wonder if there was some slight hope that the in the Bateman-Katz result would be sufficiently large, say bigger than , that it could be combined with the argument to give an improvement. This seems unlikely as I am told is rather small.

I have a fairly obvious question, which I think other people are likely to have thought about in detail, though I’m also tempted to think about it myself. Tom refers to the weakness of Bourgainization above, by which he means the fact that when you argue with regular Bohr sets, you have to take a “small” Bohr set inside the regular one, and eventually you pass down to that, which means that at each iteration the width of your Bohr set decreases. In some arguments, this is not a problem, but in others, when you have to iterate several times, it starts to affect bounds in a serious way. In particular, it explains why merely having the idea of replacing subspaces by Bohr sets is not enough to give a bound for Roth’s theorem.

This is one of our main difficulties, since if one could just straightforwardly do that replacement, then we would have a straightforward dictionary from the Roth/Meshulam argument to a version in , which would presumably make it straightforward to add on the Bateman-Katz improvement.

One version of the question I would ask in response to that is a rather vague one: does the “weakness of Bourgainization” reflect a genuinely important difference between and or does it reflect the weakness of the arguments we have so far thought of? In the rest of this comment, I’ll try to make this question more precise, though I don’t think I’ll succeed to the extent that I would ideally like.

First of all, there obviously is a genuine difference between and which is that has lots of subgroups/subspaces and doesn’t. Since I’m perfectly well aware of that, my task is to explain what I mean by this difference being important or not.

As a first attempt at that, let me give a sense in which the difference might seem more important than it actually is. One nice feature of subspaces of is that they allow one to define averaging projections: given a subspace and a function defined on we define by the formula which replaces the value at by the average over the coset If is a Bohr set, we can’t do this: it is not in general possible to partition by translates of and even if one could, one would have to choose some translates and not others, a choice one does not have to make with subspaces. Nevertheless, there is a useful analogue of averaging projections, which is convolving with characteristic measures. That is, we define by the formula (The last equality relied on the fact that ) Note that we can define the averaging projection as so this is a direct generalization.

When it comes to proving Roth’s theorem in , we pass to subspaces rather than taking averaging projections, so it starts to look as though it matters more that we are dealing with Bohr sets in But I can’t help wondering whether it might be possible to do something a bit cleverer that would allow one to get away with a smaller sacrifice of width.

At this point, my thoughts get a little hazy, which is why I am asking whether anyone else has been down this road already and had clearer thoughts — either negative or positive. The sort of picture I have in my mind is something like this. Instead of passing to one Bohr set , where it really is the case that if you pick a random point and then a difference you need to belong to a much smaller Bohr set in order to be able to say that probably belongs to as well, could one do something like passing to an ensemble of Bohr sets inside which you have an ensemble of smaller Bohr sets, but this time with widths smaller by a constant factor, so that if you spill out of one Bohr set you find yourself in a neighbouring one?

That’s just one suggestion that I haven’t thought about at all (I’m really trying to convey the type of thing I’d like to look for rather than the actual thing I’m looking for). But I can now try to rephrase my question about genuine importance. Does the fact that Bohr sets don’t have sharp boundaries, so to speak, really have a bearing on Roth’s theorem, or is just a technical irritant that we haven’t yet fully seen how to deal with?

One final remark is this. It may seem as though by talking about averaging projections I am advocating an energy-incrementation argument rather than a passing-to-subspaces argument. But that isn’t exactly what I’m trying to do: I’m wondering whether there could be an argument that is global enough to avoid the severe edge-effects associated with Bohr sets, but local enough to have the advantages of the passing-to-subspaces approach.

To be slightly more specific, perhaps instead of showing that the set correlates with Bohr sets, one could show that it correlates with non-negative Bohr-continuous functions or something like that. (There are various definitions one could give for this. A natural one would be to say that f is B-continuous if .)

I don’t think this actually works though, or at least not in any straightforward way. If you normalize such a function so that it becomes a probability measure and you then try to exploit the fact that contains no progression of length 3, the idea would be to argue that has a small AP count as well, and that this would give bias with respect to a new trigonometric function. And the hope would be that if then could play the role of the small Bohr set in the calculations rather than the large Bohr set. And then a suitable function built out of and the trigonometric function would become the new and a Bohr set built out of and the new frequency would become the new And for some reason that I cannot see without doing calculations and have no convincing reason to expect, the width of would not have to be substantially smaller than the width of

One small question which occurs to me from the question of the limitation of Bourganisation above is whether being in a high dimensional Bohr set helps in the construction of three-term arithmetic progression-free sets or not. Specifically, can one construct a set , free of proper three-term progressions, with larger than the Behrend bound. It’s not clear to me that it helps at all given that the lower bounds for are worse than but it would be some evidence of a genuine difference between Bohr sets and subspaces.

I suppose that there are high-dimensional progressions (rather than Bohr sets) which contain large three-term progression free sets. Indeed, in is a -dimensional progression containing no proper three-term progressions…

A somewhat related remark is this. I think the problem with the kind of proposal I suggested in comment 2 is that one has to show that the B-continuous function “ought” to contain several APs of length 3. But the only ones that it seems to have to have are “local” ones, where you have a common difference belonging to B.

But perhaps that is where the gain is: our common difference would be in B rather than in a Bohr set of much smaller width. In the course of writing this comment I have ceased to feel that I can see either that there is a problem or that there isn’t a problem. But I agree with you Tom that once one has got to the stage of considering a Bohr set of dimension one is morally living in a d-dimensional lattice, so if there is some sense in which the bounds for Roth are weaker in dimensions than they are in one dimension, then there would appear to be a genuinely important difference between and

WIth regard to the comments on Bohr sets above, I think it’s important to distinguish two reasons why one passes to Bohr sets of smaller “width”. In my opinion one (1) is really crucial: the fact that we want a situation that is roughly closed under addition, thus for example $B + B’ \approx B$ if $B’$ has much smaller width than $B$. This “approximate group” property is completely vital and it does not hold for the Bohr set $B$ by itself, where $B + B \sim 2^d B$. The second reason (2) for passing to smaller Bohr sets is one that Tim mentions above – Bohr sets can have nasty “edge effects”, so one needs to work with so-called “regular” Bohr sets. My feeling – one that I think Tom will back up – is that for the purposes of this discussion we should just assume that all Bohr sets are regular.

To summarise, then: (1) is a really serious conceptual point, and (2) is a technical irritant that can always be bypassed in practice.

In our offline email exchanges I mentioned that we probably “should” be working in $[N]$ rather than in $Z/NZ$, my reasoning being that the group structure in the latter is a bit artificial (some progressions are not “real” progressions). Now I’ve thought about it a bit more, I don’t mind working in $Z/NZ$ so much. The thing is that there one can talk about “the set of large Fourier coefficients”, something that Bateman and Katz are doing all the time, rather than looking at sets of $1/100 N$-separated Fourier coefficients on the circle $R/Z$ or something like that. Of course this is all a matter of finding a good language to talk in and there is no genuine mathematical gain to be made from working in one or the other setting (so far as I know).

OK, enough from me. I plan to try and present one or two lemmas from Bateman Katz in the way that I like to understand things. I know other people (for example Tim) are doing the same, and fully concur with him that this should be helpful despite the inevitable duplication of effort.

Responding to comment 7: I just want to clarify that the thought I was floating in comment 3 was that if we want approximate closure under addition, we don’t necessarily have to have a pair of Bohr sets B and B’ with B’ smaller than B. Instead, we could have a single Bohr set B and a probability measure about which we know nothing other than that Thus would be “approximately closed under B-addition”.

I’m not claiming that that actually works, or that if it does then it gives better bounds for anything. But I do want to think about it a little unless someone can persuade me in advance that it’s not worth the bother.

I am interested in the project of getting useful physical space information from large Fourier coefficients.
If this can’t be done for one coefficient as in the cap-set question, perhaps can the presence of many large
coefficients help? What would constitute a counterexample convincing us that such an approach is futile?

One way I imagined being helpful at this stage was to write some notes helping to explain our argument. I see
you have a wiki for this, but I don’t have the appropriate permissions to modify it. I’ve noticed that
several of you plan on contributing to it and don’t want to step on your toes.

However let me know if an outline of the argument in section 6 and a description of some examples that necessitate
the difficulties might be helpful.

I created some sections on the wiki for your paper with Bateman and started to add some content based on the discussions we had last week. I don’t want this to dissuade anyone from writing different takes on whatever I have added, however. Indeed, I imagine having several different perspectives on things will be very useful. So please don’t worry about the stepping on of toes on my account.

I think you should be able to create an account on the wiki yourself; at least there is a link towards the top right of the site that gives that impression.

I would like to join this project. Unfortunately, I don’t know much about Bohr sets yet, which may be something of a handicap. However, I have researched the cap-set problem a little, and have come up with an idea for a computer experiment to test the finite-field heuristic by comparing against for , which I have described in a blog post here. Obviously though, I am far from certain whether data from such an experiment would be of interest, so feedback on that would be welcome.

Regarding your very reasonable question, you are of course correct that there is no difference between and . I stupidly wrote when I was thinking of , and then copied the LaTeX for my mistake all over my blog post. Accordingly, I have updated to refer to , which is what I meant. So for clarity, I am proposing to compare against , obtaining frequency counts for possible values of at each density level-set in the power set of .

I don’t think that I have explained very well why I find the experiment interesting. Certainly, I can update my rather rushed blog post with a clearer explanation. However, even if I explain that clearly, I am far from confident that the experimental data would be of wider interest. If the experiment is of interest, then I can move material to this blog or to the wiki. On the other hand, if the experiment is not of interest, I still feel honour bound to complete the experiment before moving on.

This comment is a reply to comment 2 and the need to pass to subspaces rather than taking averaging projections when proving Roth’s theorem for .

I have dug out a sketch of an alternative proof of Lemma 10.15 (Non-uniformity implies density increment) from Tao & Vu that uses the dual form of the Poisson summation formula (essentially taking averaging projections). Seeing as Lemma 10.15 is used when passing to a subspace, might it be possible to unpack my proof somehow and then replace by for the Bohr set case? The proof also improves the constant in Lemma 10.15 from 1/2 to , which makes me quite nervous that I have made a silly mistake. However, I am going to post the proof anyway in case something can be salvaged from the idea.

Lemma 10.15 + improved constant:

Write Z as a shorthand for . Let be a function with mean zero. Then there exists a codimension 1 subspace V of Z and a point , such that

Sketch of proof:

Choose such that .

Let W be the dimension 1 Fourier subspace generated by and let V be the orthogonal complement of . Let .

By the dual form of the Poisson Summation formula (Exercise 4.1.7 in Tao & Vu) .

From the Parseval identity and the fact that is real valued:

The result follows from the fact that is constant on codimension 1 affine subspaces and the pigeonhole principle.

Because the required bound on is signed, I have concluded that my proof is broken at the last point where I invoked the pigeonhole principle on . At that point it is necessary to choose a phase as is done in Tao &Vu. I think that the proof may be able to be salvaged, but the constant would not . That would still leave the core idea intact though.

The problem with using the norm (which would seem to be a very natural thing to do) is that the nonexistence of APs in a set provides a bound on the norm of the balanced function of that results, after the above argument, in a weaker density increment than you get if you use a Fourier expansion: the norm is what you care about if you are looking for a density increment, so you lose information if you use the norm.