Polymath8b: Bounded intervals with many primes, after Maynard

where denotes the prime. In other words, is the least quantity such that there are infinitely many intervals of length that contain or more primes. Thus, for instance, the twin prime conjecture is equivalent to the assertion that , and the prime tuples conjecture would imply that is equal to the diameter of the narrowest admissible tuple of cardinality (thus we conjecturally have , , , , , and so forth; see this web page for further continuation of this sequence).

In 2004, Goldston, Pintz, and Yildirim established the bound conditional on the Elliott-Halberstam conjecture, which remains unproven. However, no unconditional finiteness of was obtained (although they famously obtained the non-trivial bound ), and even on the Elliot-Halberstam conjecture no finiteness result on the higher was obtained either (although they were able to show on this conjecture). In the recent breakthrough of Zhang, the unconditional bound was obtained, by establishing a weak partial version of the Elliott-Halberstam conjecture; by refining these methods, the Polymath8 project (which I suppose we could retroactively call the Polymath8a project) then lowered this bound to .

Theorem 1 (Maynard’s theorem) Unconditionally, we have the following bounds:

.

for an absolute constant and any .

If one assumes the Elliott-Halberstam conjecture, we have the following improved bounds:

.

.

for an absolute constant and any .

The final conclusion on Elliott-Halberstam is not explicitly stated in Maynard’s paper, but follows easily from his methods, as I will describe below the fold. (At around the same time as Maynard’s work, I had also begun a similar set of calculations concerning , but was only able to obtain the slightly weaker bound unconditionally.) In the converse direction, the prime tuples conjecture implies that should be comparable to . Granville has also obtained the slightly weaker explicit bound for any by a slight modification of Maynard’s argument.

The aim of the Polymath8b project is to obtain improved bounds on , and higher values of , either conditional on the Elliott-Halberstam conjecture or unconditional. The likeliest routes for doing this are by optimising Maynard’s arguments and/or combining them with some of the results from the Polymath8a project. This post is intended to be the first research thread for that purpose. To start the ball rolling, I am going to give below a presentation of Maynard’s results, with some minor technical differences (most significantly, I am using the Goldston-Pintz-Yildirim variant of the Selberg sieve, rather than the traditional “elementary Selberg sieve” that is used by Maynard (and also in the Polymath8 project), although it seems that the numerology obtained by both sieves is essentially the same). An alternate exposition of Maynard’s work has just been completed also by Andrew Granville.

— 1. Overview of argument —

Define an admissible -tuple to be an increasing tuple of integers, which avoids at least one residue class modulo for each . For , let denote the following claim:

Conjecture 2 () If is an admissible -tuple, then there are infinitely many translates of that contain at least primes.

The prime tuples conjecture is then the assertion that holds for all . Clearly, if is true, then we have whenever is an admissible -tuple. Theorem 1 then follows from the following claim:

Theorem 3 (Maynard’s theorem, DHL version) Unconditionally, we have the following bounds:

.

whenever is sufficiently large and .

If one assumes the Elliott-Halberstam conjecture, we have the following improved bounds:

.

.

whenever is sufficiently large and .

Indeed, the results then follow from using the admissible -tuple and the admissible -tuple

found by Engelsma (and recorded on this site). For the larger results, note that the bound is obeyed if for a sufficiently large and is large enough, and the claim follows by using the observation that one can create an admissible -tuple of length by using the first primes past ; similarly if one assumes the Elliott-Halberstam conjecture. (Note as the are clearly non-decreasing in , it suffices to work with sufficiently large to obtain bounds such as .)

As in previous work, the conclusions are obtained by constructing a sieve weight with good properties. We use the same asymptotic notation as in the Polymath8a project, thus all quantities depend on an asymptotic parameter unless explicitly declared to be fixed, and asymptotic notation such , or is relative to this parameter. We let

and as before. We let be the quantity when is prime, and zero otherwise.

Lemma 4 (Criterion for ) Let and be fixed integers. Suppose that for each fixed admissible -tuple and each congruence class such that is coprime to for all , one can find a non-negative weight function , fixed quantities , a quantity , and a quantity such that one has the upper bound

the lower bound

for all , and the key inequality

Then holds.

The case of this lemma is Lemma 4.1 of the polymath8a paper. The general case is proven by an essentially identical argument, namely one considers the expression

uses the hypotheses (1), (2), (3) to show that this is positive for sufficiently large , and observing that the summand is only positive when contain at least primes.

We recall the statement of the Elliott-Halberstam conjecture, for a given choice of parameter :

If and is fixed, then

where

The Bombieri-Vinogradov theorem asserts that holds for all , while the Elliott-Halberstam conjecture asserts that holds for all .

In Polymath8a, the sieve weight was constructed in terms of a smooth compactly supported one-variable function . A key innovation in Maynard’s work is to replace the sieve with one constructed using a smooth compactly supported multi-variable function , which affords significantly greater flexibility. More precisely, we will show

Proposition 5 (Sieve asymptotics) Suppose that holds for some fixed , and set for some fixed . Let is a fixed symmetric smooth function supported on the simplex

(In fact, one can obtain asymptotics for (1), (2), rather than upper and lower bounds.)

(One can work with non-symmetric functions , but this does not improve the numerology; see the remark after (7.1) of Maynard’s paper.)

We prove this proposition in Section 2. We remark that if one restricts attention to functions of the form

for smooth and supported on , then

and

and this claim was already essentially established back in this Polymath8a post (or see Proposition 4.1 and Lemma 4.7 of the Polymath8a paper for essentially these bounds). In that previous post (and also in the paper of Farkas, Pintz, and Revesz), the ratio was optimised in this one-dimensional context using Bessel functions, and the method was unable to reach without an improvement to Bombieri-Vinogradov, or to reach even on Elliott-Halberstam. However, the additional flexibility afforded by the use of multi-dimensional cutoffs allows one to do better.

Combining Proposition 5 with Lemma 4, we obtain the following conclusion. For each , let be the quantity

where ranges over all smooth symmetric functions that are supported on the simplex . Equivalently, by substituting and using the fundamental theorem of calculus, followed by an approximation argument to remove the smoothness hypotheses on , we have

where ranges over all bounded measurable functions supported on . Then we have

Corollary 6 Let be such that holds, and let , be integers such that

Then holds.

To use this corollary, we simply have to locate test functions that give as large a lower bound for as one can manage; this is a purely analytic problem that no longer requires any further number-theoretic input.

combining this with (13) we see that the contribution to (14) in which or is negligible, so we may restrict the integral in (14) to the region . In this region, we have the Euler product approximations

where we have used the bound and the asymptotic for . Since also , we conclude that

Using (13) again to dispose of the error term, and then using (13) once more to remove the restriction to , we thus reduce to verifying the identity

But from repeatedly differentiating (12) under the integral sign, one has

and thus

integrating this for using Fubini’s theorem (and (13)), the claim then follows from (5). This concludes the proof of (1).

Now we prove (2). We will just prove the claim for , as the other cases follow similarly using the symmetry hypothesis on . The left-hand side of (2) may then be expanded as

Observe that the summand vanishes unless (note that is comparable to and thus exceeds ). So we may simplify the above expression to

As in the estimation of (1), the summand vanishes unless are coprime, and if

Let obey the above constraints. For any modulus , define the discrepancy

Since and is fixed, the hypothesis implies that

for any fixed . On the other hand, the sum

can, by the Chinese remainder theorem, be rewritten in the form

where

and is a primitive residue class; note that does not exceed . By (15), this expression can then be written as

Let us first control the error term, which may be bounded by

Note that for any , there are choices of for which (17) holds. Thus we may bound the previous expression by

By the Cauchy-Schwarz inequality and (16), this expression may be bounded by

for any fixed . On the other hand, we have the crude bound , as well as the standard estimate

(see e.g. Corollary 2.15 of Montgomery-Vaughan). Putting all this together, we conclude that the contribution of the error term to (2) is negligible. To conclude the proof of (2), it thus suffices to show that

But this can be proven by repeating the arguments used to prove (11) (with replaced by , and replaced by ); the presence of the Euler totient function causes some factors of in that analysis to be replaced by , but this turns out to have a negligible impact on the final asymptotics since . This concludes the proof of (2) and hence Proposition 5.

Remark 1 An inspection of the above arguments shows that the simplex can be enlarged slightly to the region

however this only leads to a tiny improvement in the numerology. It is interesting to note however that in the case, is the unit square , and by taking and taking close to , one can come “within an epsilon” of establishing (and in particular, the twin prime conjecture) from the full Elliott-Halberstam conjecture; this fact was already essentially observed by Bombieri, using the weight rather than the Selberg sieve. (Strictly speaking, to establish (1) in this context, one needs the Elliott-Halberstam conjecture not only for , but also for other arithmetic functions with a suitable Dirichlet convolution structure; we omit the details.)

Remark 2 It appears that the conjecture studied in Polymath8a can serve as a substitute for in Corollary 6, except that one also has to impose an additional constraint on the function (or ), namely that it is supported in the cube (in order to keep the moduli involved appropriately smooth). Perhaps as a first approximation, we should ignore the role of , and just pretend that is as good as . In particular, inserting our most optimistic value of obtained by Polymath8, namely , we can in principle take as large as , although this only is a improvement or so over the Bombieri-Vinogradov inequality.

The function is clearly symmetric and supported on . We now estimate the numerator and denominator of the ratio

that lower bounds .

For the denominator, we bound by and use Fubini’s theorem and (8) to obtain the upper bound

and thus

Now we observe that

whenever , and so we have the lower bound

We interpret this probabilistically. Let be independent, identically distributed non-negative real random variables with probability density ; this is well-defined thanks to (18). Observe that is the joint probability density of , and so

We will lower bound this probability using the Chebyshev inequality. (In my own calculations, I had used the Hoeffding inequality instead, but it seems to only give a slightly better bound in the end for (perhaps saving one or two powers of ).) In order to exploit the law of large numbers, we would like the mean of , where , to be less than :

The variance of is times the variance of a single , which we can bound (somewhat crudely) by

Thus by Chebyshev’s inequality, we have

To clean things up a bit we bound by to obtain the simpler bound

assuming now that . In particular, , so we have the cleaner bound

To summarise, we have shown that

whenever is such that

and

One can optimise this carefully to give (8) (as is done in Maynard’s paper), but for the purposes of establishing the slightly weaker bound (9), we can use of the form

133 comments

There are a couple of initial directions to proceed from here, I think. One is to get better bounds on ; as pointed out by Eytan in the previous thread; Maynard is working with functions of the first two symmetric polynomials and , but in principle all the symmetric functions are available. (Though perhaps before doing that we should at least get to the point where we can replicate Maynard’s computations; note that he has supplied a Mathematica notebook in the source code of his arXiv file, although it is slightly nontrivial to extract it from the arXiv interface.) Another is to use MPZ instead of Bombieri-Vinogradov, but this puts an additional constraint on the test function F that has to be accounted for, and is likely to have a modest impact on (improving this quantity by about 5% or so). A third thing to do is to try to enlarge the allowable support of the function F, as per Remark 1; this is likely to be a very modest gain (basically it should shift downward by 1 or so), but this could be significant for bounding on EH, as it may shift down from 5 to 4 (bringing the bound from 12 to 8).

Indeed, it seems that focusing on the case (assuming EH) may well be the best place to start, as the numerics should be fast and one can work a certain number of things out by hand in such low-dimensional cases.

Any chance I can make you (on anyone else) convert the text document into a notebook? I’m totally new to Mathematica and would like to play around with a fully working notebook. (I’m using v9.0 in case it is important.) [Sorry for interrupting the thread.]

I don’t think you can get to (under EH) using just Remark 1 and higher symmetric polynomials (this seems to get only about half way numerically). That said, there are various possible small modifications (such as those you mention below) which might give enough of an improvement.

That’s precisely what is happening in the articles I’ve linked. The author then “constructs” some invariant orthogonal polynomials on the standard simplex. The problem is to find the matrix of with respect to the basis of orthogonal symmetric polynomials.

Of course, we can always take arbitrary polynomials in elementary symmetric functions (which give us all invariant polynomials) and evaluate the integrals numerically.

I only looked very briefly at the numerical optimization with non-polynomial functions.

We know (Weierstrass) that polynomial approximations will converge to the optimal function. Moreover, I found that the numerical bounds from polynomial approximations seemed to converge quickly.

For , it is easy to use approximations involving higher symmetric polynomials, but there is only a very limited numerical improvement after having taken a low-degree approximation.

For things are more complicated, and convergence is a bit slower, so if there is some nicer basis of symmetric functions maybe thats worth looking at. That said, I would have guessed that it is computationally feasible to include higher degree symmetric polynomials and get a bound that is accurate enough to get the globally best value of (if we’re only following the method in my paper – with new ideas, such as MPZ this might no longer be true).

There is no need for numerical evaluation since any monomial in over the simplex is transformed to a monomial in over the unit ball whose integral is separable (by using spherical coordinates) with explicit expression in terms of beta functions (in this case factorials.)

Naive questions: 1) Do the small gaps results apply to primes in arithmetic progressions? 2) Could there be a way to bootstrap results for k-tuples of prime in small intervals to get back results on 2-tuples. (The k-tuples result feels in some sense also stronger than the twin prime conjecture.)

For (1), this should be possible. The original Goldston-Pintz-Yildirim results were extended to arithmetic progressions by Freiberg (http://arxiv.org/abs/1110.6624) and to more general sets of Bohr type by my student Benatar (http://arxiv.org/abs/1305.0348). In Maynard’s paper, it is noted that for the purposes of getting finite values for , the set of primes can be replaced by any other set that obeys a Bombieri-Vinogradov type theorem (at any level of distribution, not necessarily as large as 1/2), which includes the primes in arithmetic progressions (and also primes in fairly short intervals).

There may be some interplay between the k-tuple results and the 2-tuple results, but note that the set of primes we can detect in a k-tuple is rather sparse, we only catch about primes in such a tuple. Basically, Maynard’s construction gives, for each and each admissible tuple , a sieve weight (which is basically concentrated on a set in of density about ) with the property that if is chosen randomly with probability density equal to , then

(or, in the original notation, ) for , which by the first moment method shows that with positive probability at least of the will be prime. As increases, it is true that we get a few more primes this way, but they are spread out in a much longer interval, and the number of with this many primes also gets sparser, so it’s not clear how to perform the tradeoff.

But this does remind me of a possible idea I had to shave 1 off of . Suppose for instance one was trying to catch two primes close together in the tuple . Currently, we need estimates of the form

to catch two primes at distance at most 8 apart; one can also replace the four constants by other constants that add up to 1, although this doesn’t seem to help much. To get primes distance at most 6 apart, one would ordinarily need to boost three of these constants up to (and drop the last one down to zero). But suppose we had bounds of the form

.

Then one could get a gap of at most 6, and not just 8, because the first four estimates show that there are two primes in n,n+2,n+6,n+8 with probability greater than , and the last estimate excludes the possibility that this only occurs for the n,n+8 pair. Unfortunately with sieve theory it is difficult to get really good bounds on , so I don’t know whether this idea will actually work.

Here is a slight refinement of the last idea. Assume EH, and suppose we want to improve the current bound of to . Assume this were false, thus after some point there are no pairs of primes of distance 10 or less apart. Thus, given any large random integer n (chosen using a probability distribution ), the events “n prime”, “n+2 prime”, “n+6 prime”, “n+8 prime”, “n+12 prime” are all disjoint, except for the pair “n prime” and “n+12 prime” which are allowed to overlap. If we then let be the probability that is prime, and be the probability that n and n+12 are simultaneously prime, we thus have the inequality

So if we can get a lower bound on

(1)

which is asymptotically better than an upper bound for

then we can reduce from 12 to 10. The tricky thing though is still how to get a good upper bound on . But perhaps there is now a fair amount of “room” in the ratio between (1) and if we use more symmetric polynomials and enlarge the simplex as per Remark 1; the computations in Maynard’s paper show that this ratio is at least which gives almost no room to play with, but perhaps with the above improvements one can get a more favorable ratio.

I thought about this idea a bit more, but encountered a problem in the EH case: if one wants a meaningful lower bound on the second term, then one starts wanting to estimate sums such as for various values of , but this is only plausible if the sieve level of is a bit below the threshold of (by a factor of ), but then one is probably losing more from having a reduced sieve level (as it now has to be below the optimal value of that one wants to pick if one wants to get the maximum benefit of EH) than one is gaining from removing this piece.

However, the situation is better if one is starting from Bombieri-Vinogradov or MPZ as the distribution estimate (so one would now be trying to beat the 600 bound rather than the 12 bound). Here the sieve level is more like and so there is actually a fair bit of room to insert some divisibility constraints and still be able to control . The bad news is that the final bound one gets here could be really small, something like of the main term, which looks too tiny to lead to any actual improvement in the H value.

Good point. Maybe (I haven’t really thought about this) we can use exponential sum estimates to get us a bit of extra room, but this is already sounding like quite a lot of work for what would be only a very small numerical improvement.

(and hence ) but I don’t know how effective it will be yet. Basically, the idea is to observe that if is given in terms of a multidimensional GPY cutoff function f as in equation (10) in the blog post, then when n and n+12 are both prime, we have , whenever is given in terms of another multidimensional GPY cutoff function f’ with the property that for all . Then

.

The last step is of course crude, but it crucially replaces a sum over two primality conditions (which is hopeless to estimate with current technology) with a sum over one primality condition (which we do know how to control, assuming a suitable distribution hypothesis). The final expression may be computed by Proposition 5 of the blog post to be

and so the probability above may be upper bounded by the ratio

.

If we set f’=f then this gives the trivial upper bound , but the point is that we can do better by optimising in f’.

Now the optimisation problem for f’ is much easier than the main optimisation for : we need to optimise the square-integral of subject to being supported on the simplex and having the same trace as on the boundary . (No symmetry hypothesis needs to be imposed on f’.) It is not difficult (basically fundamental theorem of calculus + converse to Cauchy-Schwarz) to see that the extremiser occurs with

and so we have

.

But I don’t know yet how strong this bound will be in practice; one needs to stay a little bit away from the extreme points of the simplex in order for the denominator not to kill us. (On the other hand, as observed earlier, the bound can’t get worse than .)

In the large k regime, the analogue of has size about ; I’m hoping that the bound for is significantly smaller, indeed independence heuristics suggest it should be something like . It’s dangerous of course to extrapolate this to the k=5 case but one can hope that it is still the case that is small compared to in this case.

Nice! If we’re only using Bombieri-Vinogradov (or MPZ) then presumably we can use a hybrid bound, where if is small we use your argument as above, but if is larger we instead use
(i.e. rather than forcing n to be prime and n+12 an almost-prime with a very small level-of-distribution, we relax this to n and n+12 being almost-primes.)

If we’re using Elliott-Halberstam then we get no use from this, since the primes have (apart from an ) the same level-of-distribution results available.

Certainly we have the option of deleting both primality constraints, but I’m not sure how one can separate the large case from the small case while retaining the positivity of all the relevant components of , which is implicitly used in order to safely drop the constraints that n is prime or n+12 is prime.

In the large k regime, if we take the sieve from your Section 8, but restrict it to a smaller simplex, such as , then it does appear that the analogue of scales like , which it should (morally we should have , though with current methods we could only hope for an upper bound that is of this order of magnitude). It would be interesting to see what numerics we can get in the k=5 case though.

One nice thing is that all the bounds for , etc. are quadratic forms in the function f, so the same linear programming methods in your paper will continue to be useful here, without need for more nonlinear optimisation.

Ah, I see. That could work, although one loses a factor of 2 from using the inequality and so splitting might not be efficient in practice – but I guess we can experiment with all sorts of combinations to see what works numerically and what doesn’t.

We can regain the factor of two by being a bit more careful. Let , where is provided is small, and is the remainder. Then, expanding the square, we would use your bound for all the terms except the terms. Of course, it might be that in practice that the benefit of doing this is small.

(PS: Apologies for the formatting errors I’ve had in my posts – I’ll try to check them more carefully)

Hmm… but in general, the do not have a definite sign, and so I don’t see how to discard the cutoffs or in the cross terms unless one used Cauchy-Schwarz (which would be a little bit more efficient than the inequality, though not by too much).

Actually, I think the Cauchy step I suggested above is rubbish (I was clearly more tired last night than I realized). We would lose the vital filtering out of small primes, or we would be doing worse than just going one way or the other.

I think my concern was that with only a limited amount of room for divisibility constraints we might only be filtering out factors of which are typically less than for a constant , and potentially losing a factor of . (I’m thinking about the case when is large).I think this is what would be happening with the original GPY weights, but our weights are constructed so that the main contribution is when is of size (some suitable constant ), which gives us enough room that we don’t need to worry here.

Presumably this would give an improvement to of size about (ignoring all log factors).

Re (1), depends what exactly one means by GPY for APs, but GPY extended their original work to primes in APs — wasn’t me. My understanding is that these amazing new results apply equally well to admissible k-tuples of linear forms g_1x + h_1,…,g_kx + h_k, not just monic ones. I’m sure we’ll see lots of neat applications of this that we can get almost for free after Maynard’s spectacular proof!

Good question! There is a funny failure of monotonicity when one tries to use the boost in Remark 1, which I’ll try to explain a bit more here.

Start with the task of trying to prove DHL[2,2], or more precisely the twin prime problem of finding two primes in . The way we try to do this is to pick a sieve weight which basically has the form

for some sieve coefficients whose exact value we will ignore for now. We then want to control sums such as

The first sum can be easily estimated in practice, so let us ignore it for now and focus on the second two sums. At first glance, because of the two divisibility constraints , it looks like we need to have any real hope of controlling the sum; but note that the weight forces to be prime and so will be , so we actually only need to control the second sum, and similarly to control the first sum. The upshot of this is that the function in the blog post only needs to be supported on the square rather than the triangle (and if one then sets F equal to 1 on this square, one gets , coming within an epsilon of the twin prime conjecture on EH).

[I'm glossing over some details as to how to deal with the allegedly easy first sum. If then this sum is indeed easy to estimate, but if can exceed then one has to proceed more carefully. I think what one does here is try to use a separation of variables trick, breaking up into pieces of the form , where is a divisor sum of only, and is a divisor sum of only, and then use an Elliott-Halberstam hypothesis for or (rather than ) to control error terms; this should get us back to the weaker constraints , .]

Now we turn to DHL[3,2], playing with the tuple n,n+2,n+6. Now we use a sieve of the form

and want to estimate the sums.

Again ignoring the first sum as being presumably easy to handle, we now need the constraints in order to control the latter three sums respectively (rather than ). In terms of the cutoff function F, this means that we may enlarge the support from the simplex to . But note now that we have a constraint present for the 3-tuple problem which was not present in the 2-tuple problem, because we had no need to control in the 2-tuple problem. So there is a failure of monotonicity; the simple example of which gets us within an epsilon of success for 2-tuples on EH doesn't directly translate to something for 3-tuples.

Thinking about it a bit more, this lack of monotonicity is probably a sign that the arguments are not as efficient as they could be. In what one might now call the “classical” GPY/Zhang arguments, one relies on the Elliott-Halberstam conjecture just for the von Mangoldt function . To obtain the enlargement of the simplex, one would also need the EH conjecture for other divisor sums , but this type of generalisation of EH is a standard extension (see e.g. Conjecture 1 of Bombieri-Friedlander-Iwaniec) and so this would not be considered that much more “expensive” than EH just for von Mangoldt (although the latter is pretty expensive at present!).

To recover monotonicity, one would have to also assume EH for hybrid functions such as . One could simply conjecture EH for such beasts, but this now looks considerably more “expensive” than the previous versions of EH (indeed, it may already be stronger than the twin prime conjecture, especially if the level of the divisor sum is allowed to be large). On the other hand, perhaps a Bombieri-Vinogradov type theorem for these hybrid functions is not unreasonable to hope for (though I am not sure exactly how this would help in a situation in which we already had EH for the non-hybrid functions).

Unfortunately these tuples somehow live in completely different worlds and don’t seem to have much interaction. Specifically, the only chance we have to make prime for large n is if , while the only chance we have to make prime is if . So any sieve that “sees” the first tuple in any non-trivial way would necessarily be concentrated on , and any sieve that sees the second tuple would be concentrated instead on . The distribution of primes in and in are more or less independent (one may at best save a multiplicative factor of two in the error terms by treating them together, which is negligible in the grand scheme of things), so I don’t think one gains much from a weighted average of sieves adapted to two different residue classes mod 3.

Good point. I should have used {0,2,6} and {2,6,8} together instead. I think the idea in my head was something along the lines of your earlier post, about using probabilities in some ways. This would require our weights to discriminate against both 0 and 8 being prime simultaneously (in this case), and that appears to be a difficult task.

One thought on the variational problem; for sake of discussion let us take k=3. The quantity is then the maximum value of

for symmetric F(x,y,z) supported on the simplex subject to the constraint

(As discussed earlier, we can hope to enlarge this simplex to the larger region , but let us ignore this improvement for now.) In Maynard’s paper, one considers arbitrary polynomial combinations of the first two symmetric polynomials , as candidates for F.

However, we know from Lagrange multipliers (as remarked after (7.1) in Maynard’s paper) that the optimal F must be a constant multiple of

and so in particular takes the form

for a symmetric function G(x,y) of two variables supported on the triangle . So one could collapse the three-dimensional variational problem to a two-dimensional one, which in principle helps avoid the “curse of dimensionality” and would allow one to numerically explore a greater region of the space of symmetric functions (e.g. by writing everything in terms of G and expanding G into functions of symmetric polynomials such as ).

It is possible that one could iterate this process and reduce to a one-dimensional variational problem, but it looks messy and I have not yet attempted this.

I was unable to iterate the reduction in dimension step more than once. (This is roughly why I was able to solve the eigenfunction equation when , since it reduces to solving a single variable PDE, but not for ).

The eigenfunction equation for looks like
and I failed to get anything particularly useful out of this (although maybe someone else has some clever ideas – I’m far from an expert at this. Differentiating wrt x and y turns this into a two variable PDE). One can do some sort of iterative substitution, but I just got a mess.

A cheap way to combine Maynard’s paper with the Zhang/Polymath8a stuff: if is true, then we have as . The reason for this is that the function F (or f) constructed in the large k_0 setting is supported in the cube (as well as in the simplex); by construction, decays like , so in particular f will be supported on for large enough. This means that the moduli that come out of the sieve asymptotic analysis will be -smooth, and so MPZ may be used in place of EH much as in Zhang (or Polymath8a).

So using our best verified value 7/600 of , we have the bound , and using the tentative value 13/1080 this improves to . Not all that dramatic of an improvement, but it does show that the Polymath8a stuff is good for something :).

The computations in Section 8 of Maynard’s paper should give explicit values of H_m for small m that are fairly competitive with the bound , I think.

For simplicity let us just work off of Bombieri-Vinogradov (so is basically 1/2); one can surely push things further by using the MPZ estimates, but let’s defer that to later. The criterion (Proposition 4.2 of Maynard, or Corollary 6 here) is then that holds whenever , where is the diameter of the narrowest k-tuple. For instance, gives .

In Section 8 of Maynard, the lower bound

(1)

holds for any such that

(1′)

where

.

(here are the moments , where .) It looks like a fairly routine matter to optimise A,T for any given k, and then find the first k with (to get a bound), the first k with (to get a bound), and so forth.

Actually there is some room to improve things a bit; if one inspects Maynard’s argument more carefully, one can improve (1) to

(2)

where is the variance

and the criterion (1′) is replaced with

(2′)

which could lead to some modest improvement in the numerology. (I also have an idea on removing the -T in the denominator in (2) and (2′) (or the -T/k in (1) and (1′), but this is a bit more complicated.)

“The computations in Section 8 of Maynard’s paper should give explicit values of H_m for small m that are fairly competitive with the bound H_1 \leq 600, I think.”

Assuming I didn’t make any errors, for you can get rather easily with your improved equations. (Take for example.) I couldn’t get to work out. Also, I didn’t try to do so others might want to give it a try.

I didn’t do anything fancy to find the solution for — just made a lot of graphs. Similarly, for I just graphed enough points so that I was convinced that the maximum for the RHS of (2) is less than 4.

P.S. While I was playing around with trying to actually find the optimal solution, I did notice that making the substitution (for some variable , which we may want to replace with an exponential) simplifies things quite a bit. For instance, this allows one to convert (2′) into an inequality of the form for a function (which has a number of nice properties).

P.P.S. I just discovered the “FindMaximum” function in Mathematica, which concurs with my findings. When , It says that the RHS of (2) takes a maximum value of when (2′) holds, at . On the other hand, when , then the RHS of (2) is less than 4, when (2′) holds.

If this function is to be believed, when then is the lowest value for which the RHS of (2) is bigger than 8 when (2′) also holds. One takes .

In light of James’ calculation below that a naive insertion of (or ) gives good results for the small k numerics, it is now tempting to do the same for the large k asymptotics given above.

If one does things optimistically (ignoring ), then all one has to do is replace the condition with for one’s favourite value of . Given that in the small k case this improved 105 to 68, perhaps we could hope that 145 improves to below 105?

To do things properly (i.e. rigorously), we unfortunately have to also take into account , and specifically there is an annoying new constraint

needed to ensure that all the moduli involved are -smooth. This (when combined with either or ) is likely to cause some degradation in the bounds, but at least these bounds would be rigorous.

On the other hand, we do know that our MPZ estimates only need a certain amount of dense divisibility rather than smoothness. It is likely that one could modify the above arguments (probably by introducing two different deltas, much as we did in Polymath8a) we could reduce the dependence on delta to the point where it is almost negligible. But this looks a little tricky and is probably best deferred until a little later.

p.s. Gah, I just realised that my formula for given previously was incorrect, as I had lost the factor. The true formula for is

which unfortunately is likely to make the previous computations a little bit worse. Sorry about that!

Please forgive me my ignorance, I even didn’t fully read Maynard’s preprint.
On page 5, in Proposition 4.1, he defines with an outer integration over the (k-1)-dim. cube. On page 19, along the proof of Lemma 7.2, formula (7.12) he uses an outer integration over the (k-1)-dim simplex. So my question is, if the outer integration in the definition of was over the (k-1)-dim simplex, too?

which shows that is the largest eigenvalue of . (Actually, as is not quite compact, although it is bounded, self-adjoint, and positive definite, one has to be a little careful here with the spectral theory to ensure that only has point spectrum at its operator norm; I think it will work out though, I will try to supply some details later. In any case this doesn’t impact the substance of Maynard’s paper, as one never actually analyses eigenfunctions in that paper.)

A little expansion of Eytan’s remark (a version of the power method): from Cauchy-Schwarz one has

which upon rearranging means that the Rayleigh quotient is non-decreasing if one replaces by . Iterating this, we may replace by (or any higher power and guarantee a better lower bound on unless one is already at an eigenfunction.

This suggests that we can improve the large k analysis (and in particular, the m=2 values) by taking our current candidate for f and applying to it to obtain a new candidate that is guaranteed to be better. This is a bit messy though; a simpler (though slightly less efficient) approach would be to compute instead of and use the inequality

to obtain what should presumably be a better lower bound for M_k than what we currently have.

It seems that the integral operator above is compact since its representing integrand is continuous on its domain (i.e. the product of the simplex in t and the simplex in u.)
But I don’t see why its representing integrand is symmetric (as it should for a self-adjoint operator)!

I suggest to see if the integrands representing powers of this operators can be evaluated explicitly (e.g. by the integrand composition operation) – so the traces of (small) powers of the operator may be evaluated – giving upper bounds for corresponding powers of .

One can check by hand that for any (say) bounded compactly supported F,G on the simplex.

The operator is not quite an integral operator because each of the components only integrates in one of the dimensions rather than all of the dimensions. As such, it doesn’t look compact, but may still have pure point spectrum, at least at the top of the spectrum (I have to check this). In particular, I don’t expect the traces of powers of this operator to be finite. A model case here is the operator given by

which is positive semi-definite and bounded with an operator norm of 1, but has an infinity of eigenfunctions at 1 (any f that is independent of y will be an eigenfunction).

I no longer have any clear intuition as to whether the spectrum of is pure point, absolutely continuous, or some combination of both (it could also possibly have singular continuous spectrum); the situation could be as complicated as it is for discrete or continuous Schrodinger operators, which have a very wide range of possible behaviours. Consider for instance the model operator

on the triangle (all variables are assumed to be non-negative); this is one of the two components of in the k=2 case. One can split into the functions that are constant in the y direction, and functions that are mean zero in the y direction. The operator T annihilates the latter space, and acts by multiplication by on the former space. So there is absolutely continuous spectrum on and a big null space at 0. The operator norm of 1 is not attained by any function, but one can get arbitrarily close to this norm by, e.g. the indicator function of the rectangle for (one can normalise to have unit norm in L^2 if one wishes).

Despite the possible lack of eigenfunctions (which in particular may mean that in some cases the maximum of the ratio is not attainable), I believe it is still correct that the operator norm of for all functions on the simplex is the same as the operator norm of for symmetric functions on the simplex, which informally means that we can restrict without loss of generality to the case of symmetric functions F. To see this, suppose that we have a sequence of -normalised functions on the simplex which is a extremising sequence for , thus this sequence approaches the supremum as . By replacing with its absolute value (which does not change the L^2 norm, but can only serve to increase the ) we may assume that the are non-negative. In particular, the projection to the symmetric functions (i.e. the symmetrisation of ) is bounded away from zero (in fact it is at least ). An application of the parallelogram law (or the spectral theorem) shows that any average of two or more extremising sequences is again an extremising sequence (if the norms stay away from zero), so the symmetrised are also extremising. (One could also have taken a slightly higher-minded approach and split up according to isotypic components here if one wished.)

My tentative understanding of what is going on near the top of the spectrum is that if one has strict monotonicity then the supremum should be obtained by an eigenfunction of , but if one has equality then the near-extrema may instead be obtained from lower-dimensional functions, e.g. functions of the form where is a near-extremiser of the -dimensional problem (supported on a slightly shrunken version of the -dimensional simplex).

It is presumably the case that the are strictly increasing (except possibly for very small k), but I don’t see how one would prove this other than numerically (and even numerical verification is a bit of a challenge because we do not yet have good upper bounds on ).

It should be remarked that in all iterative methods of this type (i.e. “gradient based”), is a linear combination of – for which Maynard’s method (for best linear combination) should be applied on these basis functions.

Assume with the provisional value . Using computations as in my paper (a polynomial approximation of terms with , we find for that . We find then that , and so two primes with , which corresponds to gaps of size 356.

That’s smaller than I would have expected, given that is only 5% or so bigger than 1/4. (Incidentally, with our conventions, we would use rather than . But I guess since is already as large as 2, and seems to grow logarithmically, it isn’t so unreasonable.

Do you have some preliminary data on how your lower bounds on behave in k for small k (e.g. up to 105)? This would give a baseline for further numerical improvement, and would also give a better sense as to what kind of growth should have (given that the values should be pretty good for small k, perhaps enough to do some reliable extrapolation).

Of course we have to play with MPZ instead of EH, which causes some difficulties (specifically, we can’t play on the entire simplex, but must truncate to the cube ), but it does show that there is hopefully a fair bit of room for improvement.

Incidentally, the 7/600 value is not so provisional (once Emmanuel signs off on Section 10 of the Polymath8a paper, in particular, it would have been checked at least once). We have a much more tentative value of 13/1080 that is not fully verified (it requires some difficult algebraic geometry that we haven’t got around to yet) but it might be interesting to see how much more of a gain one could expect with this slightly better value.

Sorry for the stupid question but, In Maynard’s Lemma 5.1 (and roughly in here just below (10) ) it states that if are not pairwise coprime then and cannot be satisfied by any n. It’s utterly trivial if d divides n and one of the lcms but I can’t seem to understand why it works if d divides two of the lcms.

If d divides both and , then it divides and , and hence also . On the other hand, since is coprime to W, must be coprime to . Since is the product of all small primes, and in particular all primes dividing (if x is large enough), we obtain a contradiction.

Is it possible to give to a quantum mechanical interpretation as well? (The problem is to give a quantum mechanical interpretation to the operator – which may make sense in smaller dimensions like 3,4).
Perhaps quantum mechanical bounds (a kind of “uncertainty principle”) bounds may be obtained for ?

Here are my (unfortunately somewhat lengthy) computations for the second moment for the function

used in the large k analysis, with the quantities defined in this previous comment. Note that , and hence

(1)

and

(2)

so we can get lower bounds on by computing either of the quantities in (1) and (2), and we know from Cauchy-Schwarz that (2) should give a slightly better bound.

As a warmup, let us first compute the RHS of (1). By symmetry, this becomes

which if we restrict to the region , simplifies to

where are iid with distribution on . This, together with the Chebyshev inequality, is what gives the lower bound

which is what we had in the previous comment.

Now we compute the RHS of (2). By symmetry, this becomes the sum of

and

.

For the first expression, we restrict to the region and obtain a lower bound of

.

By Chebyshev, this is at least

(3)

subject to the constraint

. (4)

Meanwhile,for the second expression we restrict to the region and obtain a lower bound of

which we can lower bound by

which is equal to

. (5)

Thus we obtain the bound

whenever obey the constraint (4).

Asymptotically, this does not seem to give too much of an improvement to the lower bound of (a back of the envelope calculation suggests that the gain is only or less) but it might still give some noticeable improvement for the m=1 and m=2 numerics. (And if it doesn’t, it would provide some indication as to how close our test function F is to being an eigenfunction.)

Is your formula for (3) correct? Or should there be a square in the denominator of the rightmost fraction? [I ask, because three offset equations earlier, you have a formula that doesn’t match your earlier bound for as it it missing a square in the denominator of the rightmost fraction as well.]

Yes, you’re right; there should be a square in the denominator in both cases as per Chebyshev. So the first lower bound for M_k should be

and the quantity in (3) should instead be

Thanks for spotting the error. But actually this error may end up in our favour, since I think the denominator is in fact greater than 1, so perhaps xfxie’s calculations below will in fact improve with this correction!

Pace is right. For the implementation, I modified the code based on the obvious difference between M_k and (3) — so it might be the reason that the typo (which appeared in both places) was automatically skipped.

Assuming that the answer to my question is that the formula for (3) was incorrect, and that there should be a square (which I think is true, if I’m understanding your use of Chebyshev), we get (without using the inequalities from Polymath 8a) the following:

I’m noting a rather technical complication with this second moment computation when one tries to combine it with . When using MPZ, one has effectively limited oneself to using functions supported on the cube to get smoothness of the moduli. However, while the original function f has this form (if ), higher iterates do not necessarily have this form. So, strictly speaking, one cannot combine the two improvements.

However, this is likely a fixable problem provided one puts in a certain amount of effort. The function is still supported on a set in which all but m of the coordinates are known to lie in , which for small m almost guarantees multiple -dense divisibility, if not -smoothness (there will be an exceptional set in which this fails but it will be exponentially small). So it is likely that this technicality will have a negligible impact on the estimates, and hopefully will not degrade the value of k at all.

It’s all getting rather complicated; I will try to write up a fresh blog post soon to summarise the various things being explored here.

xfxie, can you please clarify – which permutation of other assumptions are you using here? Or to put it another way, to which previously-mentioned k0 assuming varpi = 7/600 (if any) is your 388 the 13/1080 counterpart?

Aubery, 13/1080 is based on Terence’s calculation here (and the counterpart of k_0 is 603). It is also mentioned in an comment above by Terence. The result is not totally verified yet. But the calculation might provide a basic sense on how far the method can reach.

Thanks, but actually that wasn’t quite my question: I know that 603 is the pre-Maynard k0 for varpi = 13/1080, but I was looking for the post-Maynard k0 for varpi = 7/600. If I’m not mistaken, Pace’s k0 of 448 assumes only Bombieri-Vinogradov, and until 13/1080 is fully confirmed I’m thinking that it might be of interest also to continue tracking the various k0 that arise from 7/600.

Also, incidentally, isn’t k0 = 42392 obsolete now? – isn’t that the m=2 counterpart of 508, which was superseded by Terry’s second-moment computations on Friday? I’m unclear as to why you and Drew are still refining H for 42392 rather than for the m=2 counterpart of 388 (which I don’t think anyone has posted yet).

The situation with k0=42392 isn’t clear to me, but if someone has a better value, please post it; I agree there is no point in optimizing H values for an obsolete k0 (although it has motivated me to spend some time tweaking the algorithm to more efficiently handle k0 in this range, so the time isn’t wasted). I’d also be curious to see a k0 for m=3 if anyone is motivated enough to try it.

As far as I know the best “Deligne-free” k0 for m=2 is still 43134, but someone please correct me if I am wrong.

I think I now have an upper bound of of logarithmic type that matches the lower bound asymptotically (up to the correction terms).

First, let me describe the warmup problem which led me to discover the bound. Morally speaking (ignoring the terms), the large k computations boil down to the problem of maximising subject to the constraints and . Calculus of variations tells us that the extremiser should be a multiple of for some suitable . Motivated by this, I wrote down the Cauchy-Schwarz inequality

(knowing that equality is attained here when is a multiple of ) and the right-hand side evaluates to be bounded by

which optimises to essentially (up to errors of ) after setting (say).

This then led me to an analogous argument in the multidimensional setting (after a rescaling). Namely, Cauchy-Schwarz gives

.

The latter integral evaluates to . Integrating in the remaining variables , we conclude that

Similarly for permutations. Summing and using the fact that on the simplex, we conclude that

[…] transfer from the corresponding laws for the standard natural numbers . For a more topical example, it is now a theorem that given any standard natural number , there exist standard primes such that ; it is an […]

According to some preliminary analysis, it seems that for , we have the linear dependence

For some (still undetermined) absolute positive constant C (related to the solution of a certain integral equation). Interestingly, it is easy to determine the dependence on without knowing the constant C!

I think that (8) in Proposition 7 is valid for all , not just for sufficiently large. First, we can assume that , since otherwise the right hand side of (8) is negative. Then, in the bound (8.17) of Maynard’s paper, we can choose such that . Clearly, , i.e. . In addition, on the right hand side of (8.17), and . is less than , hence the right hand side of (8.17) exceeds , which in turn is greater than .

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.