The distribution of primes in densely divisible moduli

As in previous posts, we use the following asymptotic notation: is a parameter going off to infinity, and all quantities may depend on unless explicitly declared to be “fixed”. The asymptotic notation is then defined relative to this parameter. A quantity is said to be of polynomial size if one has , and bounded if . We also write for , and for .

The purpose of this post is to collect together all the various refinements to the second half of Zhang’s paper that have been obtained as part of the polymath8 project and present them as a coherent argument (though not fully self-contained, as we will need some lemmas from previous posts).

In order to state the main result, we need to recall some definitions.

Definition 2 (Dense divisibility) Let . A positive integer is said to be -densely divisible if, for every , there exists a factor of in the interval . We let denote the set of -densely divisible positive integers.

Now we present a strengthened version of the Motohashi-Pintz-Zhang conjecture , which depends on parameters and .

Conjecture 3 () Let , and let be a congruence class system with controlled multiplicity. Then

The difference between this conjecture and the weaker conjecture is that the modulus is constrained to be -densely divisible rather than -smooth (note that is no longer constrained to lie in ). This relaxation of the smoothness condition improves the Goldston-Pintz-Yildirim type sieving needed to deduce from ; see this previous post.

Following Zhang, we can perform a combinatorial reduction to reduce Theorem 4 to two sub-estimates. To state this properly we need some more notation. We need a large fixed constant (that determines how finely we slice up the scales).

Definition 5 (Coefficient sequences) A coefficient sequence is a finitely supported sequence that obeys the bounds

for all .

(i) If is a coefficient sequence and is a primitive residue class, the (signed) discrepancy of in the sequence is defined to be the quantity

(ii) A coefficient sequence is said to be at scale for some if it is supported on an interval of the form .

(iii) A coefficient sequence at scale is said to obey the Siegel-Walfisz theorem if one has

for any , any fixed , and any primitive residue class .

(iv) A coefficient sequence at scale is said to be smooth if it takes the form for some smooth function supported on obeying the derivative bounds

for all fixed (note that the implied constant in the notation may depend on ).

We can now state the two subestimates needed. The first controls sums of Type I or Type II:

Theorem 6 (Type I/II estimate) Let be fixed quantities such that

and

and

and

and let be coefficient sequences at scales respectively with

and

with obeying a Siegel-Walfisz theorem. Then for any and any singleton congruence class system with controlled multiplicity we have

This improves upon Theorem 16 in this previous post, in which the modulus was required to be -smooth, and the constraints (9), (10), (11) were replaced by the stronger constraint , and (12) was similarly replaced by a stronger constraint . Of the three constraints (9), (10), (11), the second constraint (10) is more stringent in practice, while the constraint (12) is dominated by other constraints (such as (4)).

for some fixed . Let be coefficient sequences at scales respectively, with smooth. Then for any , and any singleton congruence class system we have

for any fixed .

This improves upon Theorem 17 in this previous post, in which the modulus was required to be -smooth, and the constraints (15), (16) were replaced by the stronger constraint . Of the two constraints (15), (16), the first constraint (15) is more stringent in practice.

Let us now recall the combinatorial argument (from this previous post) that allows one to deduce Theorem 4 from Theorems 6 and 7. As in Section 3 of this previous post, we let be a fixed integer ( will suffice). Using the Heath-Brown identity as discussed in that section, we reduce to establishing the bound

where , are quantities with the following properties:

(i) Each is a coefficient sequence at scale . More generally the convolution of the for is a coefficient sequence at scale .

(ii) If for some fixed , then is smooth.

(iii) If for some fixed , then obeys a Siegel-Walfisz theorem. More generally, obeys a Siegel-Walfisz theorem if for some fixed .

(iv) .

We can write for , where the are non-negative reals that sum to . We apply Lemma 6 from this previous post with some parameter

to be chosen later and conclude one of the following:

(Type 0) There is a with .

(Type I/II) There is a partition such that

(Type III) There exist distinct with and .

In the Type 0 case, we can write in a form in which Theorem 15 from this previous post applies. Similarly, in the Type I/II case we can write in a form in which Theorem 6 applies, provided that the conditions (9), (10), (11) are obeyed (the condition (12) is implied by (4)). Now suppose we are in the Type III case. For large enough (e.g. ), we see that for some fixed . Theorem 7 will then apply with

provided that we can verify the hypotheses (15), (16), (17), which will follow if we have

and

and

Since , we have

and

so we will be done if we can find obeying the constraints (18), (9), (10), (11) as well as the constraints

We rewrite all of the constraints in terms of upper and lower bounds on . The upper bounds take the form

while the lower bounds take the form

Clearly (22) is implied by (23), and (26) is implied by (27); restricting to the case (which follows from (4)), (28) is also implied by (26). Assuming (4), we also see that (23), (25) are implied by (24), so we reduce to establishing that

We begin the proof of Theorem 6, closely following the arguments from Section 5 of this previous post. We can restrict to the range

for some sufficiently slowly decaying , since otherwise we may use the Bombieri-Vinogradov theorem (Theorem 4 from this previous post). Thus, by dyadic decomposition, we need to show that

for any in the range

Let

be an exponent to be optimised later (in the Type I case, it will be infinitesimally close to zero, while in the Type II case, it will be infinitesimally larger than ).

By Lemma 11 of this previous post, we know that for all in outside of a small number of exceptions, we have

where

Specifically, the number of exceptions in the interval is for any fixed . The contribution of the exceptional can be shown to be acceptable by Cauchy-Schwarz and trivial estimates (see Section 5 of this previous post), so we restrict attention to those for which (31) holds. In particular, as is restricted to be -densely divisible we may factor

with coprime and square-free, with with , and

and

By dyadic decomposition, it thus sufices to show that

for any fixed , where obey the size conditions

and

Fix . We abbreviate and by and respectively, thus our task is to show that

This result improves upon Theorem 14 from this previous post which had the stronger condition

In practice, (50) will not hold with the original value of in Theorem 6; instead, we only use Theorem 9 the case excluded by Theorem 8, in which case we will be able to lower down to be and verify (50) in that case.

Assuming these theorems, let us now conclude the proof of Theorem 6. First suppose we are in the “Type I” regime when (49) holds for some fixed . Then by (13) we have

which means that the condition (39) is now weaker than (30) (for small enough) and may be omitted. By (9), (10), (11), we can simultaneously obey (30), (46), (47), (48) by setting sufficiently close to zero, and the claim now follows from Theorem 8.

Now suppose instead that we are in the “Type II” regime where (49) fails for some small , so that by (13) we have

From this we see that we may replace by in (14) and in all of the above analysis. If we set then the conditions (30), (39) are obeyed (again taking small enough). Theorem 9 will then give us what we want provided that

In order to apply Proposition 12 we need to modify the , constraints. By Möbius inversion one has

for any function , and similarly for , so by the triangle inequality we may bound the previous expression by

where

We may discard those values of for which is less than one, as the summation is vacuous in that case. We then apply Proposition (12) with replaced by respectively (but with unchanged), replaced with its restriction to values coprime to , and set equal to , replaced by , and replaced by and . One can check that all the hypotheses of Proposition 12 are obeyed (with (60) coming from (15), (61) coming from (16), and (62), (63) coming from (17)), so we may bound (64) by

86 comments

Once again recording the latest critical numerology. If we ignore and infinitesimals, then we have

and the critical term in the Heath-Brown identity is now of the form

where are smooth and at scale and is at scale with

thus are at scale , and is at scale . We are still a little bit above the combinatorial constraint but we will be bumping against that pretty soonish.

This critical case is on the Type I / Type III border; both arguments apply to this case, and improving either of them at this choice of parameters should lead to an improvement in the final value of . (On the other hand, the Type II analysis is not critical here, and we do not urgently need to improve that analysis further for the time being). With the Type I analysis, we may group and as and respectively, and the parameter scales are

.

In Lemma 10, it is the middle term which dominates slightly, suggesting that there is a little bit of gain to be had by refactoring to reduce and increase .

Hmm, I get slightly different numerology, though there are a number of details and lower order terms to check. As , , are already -densely divisible by construction, one should be able to reshuffle to improve the bound in (51) to

(I think), which leads to a primary constraint of

(ignoring epsilons) for Theorem 8, which after some algebra is giving me the constraint

which simplifies to

which when rebalanced against which is the best Type III constraint we currently have, seems to give

which is a little more strict than your condition, though still an improvement over .

1. In the second bullet point (for type I/II) I believe the should be , which gets confusing since in the next line there is a second meaning for .

2. In the line after those bullets “Type O” should be “Type 0”.

3. A few lines later when it says “the condition (9), (10) are obeyed” I believe you also want to include (11).

4. Earlier in the sentence containing equation (30), the should be .

5. In the offset equation before (31), did you want to write instead of (and similarly for )?

6. In the second line of (37), there is an extra right parenthesis.

7. In the second offset equation after Lemma 10, there is a missing right absolute value bar.

8. In the statement of Proposition 12, it ends with “for some fixed .” However, occurs earlier in (58). Furthermore, in the type III analysis done at the beginning of the post, I believe you take arbitrarily small. (I believe you meant, in Proposition 12, for to be a fixed, but arbitrarily small, constant [defined before (58)].)

a sketch of his new Type III estimate with Fouvry, Michel, and Nelson. I will reproduce their argument in the notation of the blog post above. The objective is to obtain an estimate of the form

for as large as possible (in particular, as large as ), but restricted to be square-free and -densely divisible. By the dense divisibility, it will suffice to show that

for that we can specify to accuracy , thus we may place in for some to be chosen later, and .

FKMN do not exploit any averaging in or ; one can probably exploit the former to improve upon their results but we will not do so here. Removing those averages, we are now looking at

This may be rewritten as

where does not depend on the residues , and the are bounded and only supported on those coprime to . Performing completion of sums in the variables, the left-hand side can be expressed as

(*)

plus terms which do not depend on (coming from the cases when at least one of the vanish), where and . The inner sum may be rescaled as

where is the hyper-Kloosterman sum

.

The now only appear through their product, and so we may gather terms and write the expression (*) as

where is the number of representations with and , and is another bounded sequence. The objective is now to show

Note that . Deligne tells us that , which bounds the LHS by , but this only works in the Bombieri-Vinogradov regime . To do better we apply Cauchy-Schwarz to eliminate the term and gain another unweighted index to sum over. From the divisor bound one has

so it suffices to show that

(**)

We can rearrange the LHS as

.

The summand is periodic in h with period
One then performs completion of sums in , using the Chinese remainder theorem and Deligne’s estimates to estimate the completed sums, and eventually ends up with a bound on the inner sum of the form

where and . So now one has to sum

A computation shows that for each choice of and , the number of pairs with and is . So now one is summing

.

One can perform the summation without difficulty and end up with

,

and then performing the summation this becomes

.

The first term is bounded by the third and can be dropped. We also crudely bound the secondary term by .
To make this less than , this is equivalent to requiring (ignoring epsilons)

.

The expression is a geometric average of the other two and so the term may be dropped. This already forces . As we may specify to accuracy , we win as soon as

which rearranges to

or equivalently

thus

.

In terms of , this gives a constraint

which can be rearranged as

. (1)

We also have

. (2)

If we play this against the Type I constraints

(3)

(4)

(5)

(6)

in the main post, one ends up with

(1+3)

(1+4)

(1+5)

(2+3)

(2+4)

(2+5)

(6)

leaving one with the final constraint

coming from merging the (1+3) and (2+4) estimates (this is not quite optimal, but will do for now).

Actually the treatment of the degenerate terms is a bit more difficult than I claimed, because the rescaling identity

is only true when are coprime to . One should be able to eliminate the non-coprime cases by introducing some “B” parameters as in the blog post treatment of the Type III case but it will get a bit messy. In principle it should work because whenever one or two of the share a common prime with , one picks up two or one Ramanujan sums in the modulus which should safely counteract all the losses coming elsewhere from having to enforce such divisibility conditions. One also has to deal with the case when all of the have a common factor; this should be handled by factoring out these common factors from and inserting “B” terms to measure the losses and gains. In past experience there were always many powers of B to spare, so I don’t think this is a major issue, but it does need to be addressed at some point.

OK, the treatment of the non-coprime case ends up being slightly tricky, one has to force coprimality before performing the factoring because one needs to optimise the factoring with respect to the common factor in order to get a convergent series.

Here are some details. The key estimate is

Proposition Let be of polynomial size. If are bounded and supported on -densely divisible squarefree , then

.

Proof By pigeonholing and dense divisbility we may factor with , with

and

.

It then suffices to show that

(*)

for some bounded that is supported on coprime and .

We pull the s summation out and then Cauchy-Schwarz in h to bound this by

.

The sum inside the parentheses can be estimated using the previous arguments as

so the LHS of (*) is

which we can rewrite as

and the claim then follows from the upper bounds on R,S.

Now back to the Type III estimates. We wish to show

where does not depend on the . After completion of sums and removing the degenerate frequencies, this becomes

where

would morally be the hyper-Kloosterman sum except that we do not yet have that are coprime to .

Suppose that . Writing , and , we can use the Chinese remainder theorem and Ramanujan sum bounds to factor as times a factor independent of of magnitude (we skip this computation, it boils down to checking a bunch of cases when consists of a single prime ). For each fixed , we apply the Proposition with H replaced by and replaced by , noting that the modulus is now only -densely divisible instead of -densely divisible, and one ends up with a total bound of

Fortunately, the net power of b here in the denominator is greater than 1, and the b summation converges (as can be seen by taking Euler products) to get a bound of

which is as required if

and we are back to the same numerology as before. So the coprimality thing is a moderate pain to deal with, but fortunately doesn’t seem to ultimately impact the final estimates.

Plugging in the constraints (1+3), (1+4), (1+5), (2+3), (2+4), (2+5), and (6) into Mathematica’s “Reduce” function (which I love by the way) yields the final constraints (if I copied everything correctly):

The code is actually quite simple which is why I like the function so much. If I had been thinking straight, I would have added the extra conditions , etc. The downside is that it is all automated, and so (in the final write-up) we may need to rederive the necessary inequalities. (But that shouldn't be too hard, given that we would know what to look for.)

All I typed was: Reduce[116p + (51/2)d<1 && 108p+25d<1 && … ]

This should be able to deal with any sort of polytope issues (at least for a small number of variables).

I found some Maple code (LinearMultivariateSystem) that does something similar, and allows me to plug in the Type I, Type II, and Type III constraints directly without having to manually play off the inequalities against each other. The (varpi,delta,sigma) polytope ends up being a bit messy but it does confirm that the main constraint is when .

This comment is an attempt to summarise the many different advances we have (whether worked out in full, or only partially sketched so far) on the problem of getting as large as possible. To simplify this overview I will not make a distinction between or (so I will not distinguish between smoothness, dense divisibility, or “double dense divisibility”).

These estimates are obtained by a combination of three types of estimates: Type I estimates, Type II estimates, and Type III estimates. We have various levels of technological advance on each of these estimates, with a question mark indicating that this technology has not yet been fully fleshed out:

* By upgrading Type I-2 to Type I-3?, and using Type I-3? + Type II-1 + Type III-2, we have tentatively computed either or (this discrepancy is not yet resolved).

* A very sketchy projection of Type I-4? + Type II-1 + Type III-2 suggests that could get as large as 1/74 (or 1/88 if we only use Type III-1 instead of Type III-2), except that the Type II-1 estimates start failing at 1/96. However this can presumably be overcome by upgrading Type II-1 to something better, such as Type II-2? or Type II-3?; we have not yet bothered to do anything to the Type II sums but we should probably do so soon.

* In the comments above, I have worked out the numerology for Type I-2 + Type II-1 + Type III-3 and obtained .

Clearly there is a lot more progress to be made. Ideally we can work out the details of all the different estimates above and arrive at Type I-5 + Type II-5 + Type III-4 which should give significantly better values of than we currently have.

One can view each of these estimates as working on some polytope in parameter space, with (or some variant thereof) holding if one can find a such that simultaneously lies in a Type I-polytope, a Type II-polytope, and a Type III-polytope.

In the next day or two I’ll start on systematically working out the various levels of Type I, Type II, and Type III estimates that we currently have, which should lead to a sequence of improvements to .

First, a convenient fact: if are both -densely divisible, then is also -densely divisible. This is because at least one of must be as large as ; if say is this large then one can use the factors of , together with the factors of times , to demonstrate the dense divisibility of .

In particular, in the context of Lemma 10 above, the quantity is -densely divisible, because are -densely divisible by construction.

Next, a Kloosterman sum bound: if

with squarefree and coprime with -densely divisible, then for any other factorisation of , we may use the k=1 van der Corput bound from this comment and bound

if , where , , and . If we choose a factorisation with in this case, we end up with

.

In the opposite case when , we basically have from completion of sums that

(strictly speaking we need to improve to to get something like this, but let’s ignore this detail) and by computing the degenerate completed Kloosterman sum we obtain a bound of here. So the net bound on is

Using this bound, we can improve the RHS in Lemma 10 of this post to

.

(one can get further improvements to the second term in the RHS by also studying the other components of and , but I don’t think we need these improvements yet.) The non-diagonal contribution to (50) is then

which needs to be

leading to the conditions

and

.

As and , this becomes

which since , , and for an infinitesimal , leads to the constraints

and

which may be rearranged as

and

.

The second constraint is clearly dominated by the first and so may be dropped. We conclude the “Level 3” bound that the Type I estimates hold for . Combining the Level 3 Type I estimates with the current best Type II estimate (which I’ve called Level 1a on the wiki) and the best Type III estimate (Level 3), Maple gives the slightly messy constraint

I’ve been looking at the Type II estimates. My plan was to mimic the “Level 2” and “Level 3” improvements for the Type I estimates to obtain analogous gains in Type II, but I found an interesting phenomenon, in that the critical numerology for the Type II estimates actually fall outside of the range in which the van der Corput type lemmas gain over the completion of sums bound, so the Level 2 and Level 3 estimates are actually worse than the Level 1 estimates. However, I was able to extract a different gain by observing that the parameter R, which currently in the Type II case is constrained to the interval

could instead be constrained to

(which, in retrospect, is closer to what Zhang did originally, though he used instead of ) and this gives slightly better numerology, improving the old constraint of to . In a bit more detail, the two constraints coming out of the Type II analysis are

and

and if one inserts the bounds , , and one obtains the constraints

and

which lead to the stated constraint .

Setting , the critical numerology for Type 2 is then , , , , . In Lemma 11, the modulus should be of size , while is of size , so , which is too large for a single van der Corput estimate to be useful (but potentially a combination of the A and B process might barely squeeze a slight gain in this regime).

The constraints on R arise in the following ways. Generally, one would like R to be as large as possible, as most of the bounds we need are of the form for various positive exponents , and various quantities that do not depend on R. (Ultimately, this is because R represents the portion of the modulus that we do not perform Cauchy-Schwarz on (when moving from (35) to (36) in the main post.) On the other hand, due to the -densely divisible hypothesis, we can only localise R to an interval of the form , where we are free to choose . So we would generally like to be large as possible as well.

However, there is a “diagonal” contribution coming from the case when have a common factor which needs to also be addressed; this is only briefly mentioned in this post (near (38)), but is done in more detail in the treatment of (32) in https://terrytao.wordpress.com/2013/06/12/estimation-of-the-type-i-and-type-ii-sums/ . The treatment of this case is complicated, but produces three upper bounds required on R, namely

(*)

Note that these quantities also necessarily obey the conditions and .

In the Type I case (when ) it is the first constraint that dominates, so that one sets for some small c. But in the Type II case (when ) it is the last case that dominates, so the largest can be here is .

However, it is possible that one could treat (32) a bit more efficiently. It seems unlikely that R could usefully ever be larger than N (otherwise it becomes pointless to try to Cauchy-Schwarz on the residue class ) but I don’t immediately have a good explanation as to why the condition (*) is so necessary. One might indeed be able to save a few more factors of this way.

We redo the estimation of the non-coprime contribution (32). Repeating the arguments in the previous post, we arrive at the task of bounding

by (plus a symmetric term that is treated similarly). As before, we crucially have that has no prime factors smaller than .

We now deviate from the previous post (which performed the m summation at this point) by introducing the new variable , which is comparable to x, and rearranging the above expression as

.

The sum is by crude estimates. By the divisor bound, the sum is then by crude bounds. By using the improved controlled multiplicity hypothesis (**), the sum is then

(noting that ), and then finally the sum is

and this is acceptable given the single condition

for some fixed . So we can set in both the Type I and Type II cases. Returning to the Type II analysis from my comment above, the constraints

and

should now be rearranged as

and

and using the bounds , and , we are left with the constraints

and

which rearrange to

and

.

The second condition is redundant, leaving us with a "Level 1c" Type II constraint of , improving upon the previous constraint . Since even the most optimistic projections we have so far do not foresee going below 1/74, it now looks like the Type II estimates are good enough for the near-term that they will not cause any further difficulty.

I’m glad to see such fast improvement! (my last suggestion to optimize the parameters at the end of the project was sent before I read your response!)
Anyway, perhaps at the end of the project, it may be possible to optimize similar parameters when all the necessary limitations on them will become clear from the detailed proof.

I’ve been trying to figure out how the combinatorial lemma has influenced the creation of types 0,I,II,III; and conversely, how some natural cut-offs in estimating certain sums has influenced the creation of the combinatorial lemma. Further, I wanted to know if there was some sort of combinatorial improvement that could be made. In that spirit, let me describe how I now view things. This likely won’t be earthshattering for the experts, but may be helpful to those “occasional number-theorists” like myself.

For record keeping, let be coefficient sequences at scales , and let be smooth coefficient sequences at scales . We take positive real numbers such that and . We also assume , and so . (Note: These real numbers are not fixed, but can change with . However, I’m postulating (from the answer to Avi Levy’s recent question), we can assume some sort of law of excluded middle, which I will use implicitly throughout this comment.) For simplicity, we assume and . Furthermore, we may assume from how the coefficient sequences are created; the non-smooth sequences are (basically) convolutions of even smaller smooth sequences. (This assumption isn’t essential, but often helps simplify things.)

We want to estimate quantities (with possible future modifications to “doubly dense divisibility”) of the form

.

Intuitively speaking, we want more 's than 's, since smoothness helps us. It is also my impression that we don't want too many terms in the Dirichlet convolution, or the sums is difficult to bound well. (But this is not always the case. For instance, the case is not always better than the Type I/II case where .) We often will combine some of the convolutions into a single term, but at the cost of sometimes losing smoothness. We say that the sum above is of signature .

Now come the combinatorics. I want to address two specific cases. First consider when . This is on the cups of what is not allowed using (the current) Type I-4 bounds. I believe that this will turn out to be a natural barrier, otherwise (as we'll see shortly) one could reprove Zhang's result without any Type III analysis.

Consider what happens when is large. Indeed, if then we are in the Type 0 case. What happens if is not quite so large? In the range we can reduce to the Type II regime, by combining all coefficient sequences except into a single term. Note that we have an added bonus; one of the two convolved functions is smooth! In the range we can similarly reduce to Type I, again with the bonus that the smaller of the two coefficient sequences is smooth.

Now, let's consider the other end of the spectrum, where is small. If then for all . Thus, recombining the sequences we can reduce to signature where . This is again Type I/II (but without the added bonus of at least one sequence being smooth). Further, when , this covers all possible cases.

Finally, we will look at the case when , which is outside the current combinatorial bounds. The cases when is large are dealt with in exactly the same way as above. Similarly, if , we can reduce to the Type I/II analysis as above. So, we reduce to the case when .

Zhang's combinatorial method here seems to consist of counting how many other 's can possibly belong to this same interval, while (without loss of generality) restricting , and further thinking of any as an (i.e. forgetting about the smoothness of such sequences).

(A) Suppose is the only in the given interval. Then , and taking just enough of the 's and attaching them to , we reduce to Type I/II.

(B) Suppose are the only 's in the given interval. Attaching all of the 's to would give a sequence of size , and so there is some subcollection of the 's which when attached to given a sequence of size between . Attaching the remainder of the 's to reduces to Type I/II.

(C) If there are more than six 's in the interval, then we reach a contradiction since in that case .

(D) If there are exactly six 's in the interval, then we have . We reduce to type II by combining , and then combining the remaining sequences.

(E) If there are exactly five 's in the interval, then we have a new situation that wasn't covered by the combinatorial lemma. (This is one of the reasons for the cutoff at .) This can happen, for instance if . There are a few ways to deal with this new situation; depending on which of the signatures is most manageable.

(F) Suppose are the only 's in the interval (and so ). If for some pair , then we reduce to Type I/II. If for every distinct pair, then . Thus, combining/convolving the sequences and all 's, we have size . As in case (B), we reduce to Type I/II.

Thus, we reduce to when . If then we are in Type III (combining with all the 's). So, we may also assume . Thus, we have

But and so . Thus . This situation is possible, for example when we have (approximately)

.

(G) Finally, consider the case when . As before, we may assume we *cannot* recombine and reduce to types I/II/III. If then we easily obtain and so . This easily reduces to Type I/II. So , and as we are not in Type III we have . As in (F), we can reduce to the case where

and so we also obtain, . This leaves . Thus, we can combine some of the 's with , and reduce to Type I/II again. So there are no extra cases here.

Observations:———-

Can we improve on this? There are at least two ways to proceed; whether they improve things remains to be seen. Instead of counting the number of indices in the interval , we could continue with the (possibly more natural) process of considering signatures in order. So, for example, we might consider what happens for the signature (with the two smooth sequences having fairly "large" scales), and ply this against the Type III analysis. Indeed, this is somewhat suggested by the current form that the Type III computations take. Those estimate gives (where here we can take if we like, which we do). So, we want as large as possible (and hence we want ). Thus, we are naturally led to consider the case when there are two, nearly equal, large scales.

Another (possibly less helpful) option would be to consider signatures of the form . If we could do a Type III analysis without the smoothness hypothesis, this would remove the combinatorial constraint .

It’s a nice observation that the Type I/II case could stretch to cover all cases if could be raised to 1/4. This might lead to a simpler proof of Zhang’s theorem (in particular, not relying on Deligne’s version of the Weil conjectures, only Weil’s original results which nowadays can also be proven by elementary means) for sufficiently small by eliminating the Type III analysis. But for the task of getting as big as possible we presumably need an “all hands on deck” approach in which we throw all the estimates we have at the problem.

Actually, now that I think about it, might already be big enough for this purpose, as this is the meeting point between and . So the Level 3 or Level 4 Type I estimates (combined with Type II) might already give a proof of Zhang’s theorem!

In the other direction, with regards to trying to push below 1/10, I think the two new critical cases to consider will be a “Type IV” sum in which n=4 and is close to and a “Type V” sum in which n=5 and is close to (I have not figured out exactly what “close to” means though). This basically corresponds to distribution results (for smooth or densely divisible moduli) for the higher order divisor functions and . I am unsure if there are any fancy arguments in the literature (beyond the Zhang-based ones) that beat Bombieri-Vinogradov in this context; it seems that is pretty much state of the art. But this is certainly a good direction to focus attention on, though perhaps not by Polymath8; new ways to control and would be very interesting beyond just the ability to squeeze a few more improvements to , , and . (I could imagine that Polymath8 eventually gets to the point where the constraint is the dominant obstruction, and then calls it a day, leaving it to others in the field to start attacking and and get some improvements to etc. as a bonus.)

I see from your new Level 4 Type III analysis, that some of my observations may have been unwarranted. However, it still may be the case that those sums of signature naturally fit “inbetween” the Type I/II sums (whose signature is ) and the Type III sums (whose signature is ).

Actually, with the Level 4 Type III estimates, we’ve just reached this point – so is basically stuck at 1/108 right now until we improve the Type I estimates again beyond Level 3. We do have an in principle improvement here (“Level 4”) by using an iterated van der Corput, but it is not yet implemented. In principle, this estimate give something like (ignoring deltas), so if we play that against that should give , though 3/200 ~ 1/67 is large enough that the Type II and Type III estimates will have to be improved again before we can actually reach this point.

If , it is easily to obtain (e.g., for , based on a rough search using a simple nonlinear optimizer calling with the maple script), if the calculation is correct. This will totally fall into the region of Engelsma's exact bounds.

Unfortunately Xfxie the inequality in Tao’s comment is in “sigma” not “delta” the inequality above comes from the type 1 inequality and still needs to be confronted with type 3’s to eliminate sigma and get the final inequality. There is no delta in the inequality because he just wants a rough idea of the result (and because that factor wasn’t calculated yet).
Can you share the optimizer you are using I guess it could save some time here.

for that we can specify to accuracy , thus we may place in for some to be chosen later, and .

FKMN discard the averaging in or . For Level 4 estimates, we still discard the s averaging (there isn’t really any plausible way we know of to exploit it thus far) but keep the alpha average. We are now looking at

This may be rewritten as

where does not depend on the residues , and the are bounded and only supported on those coprime to . We can similarly restrict to those that are coprime to .
Performing completion of sums in the variables, the left-hand side can be expressed as

(*)

plus terms which do not depend on (coming from the cases when at least one of the vanish), where and . Ignoring for this discussion the coprimality problem mentioned in a previous comment, the inner sum may be rescaled as

as before, so we may gather terms and write the expression (*) as

as before. The objective is now to show

By Cauchy-Schwarz as in the previous comment it suffices to show that

(**)

We can rearrange the LHS as

.

The summand is periodic in h with period
One then performs completion of sums in , using the Chinese remainder theorem and Deligne’s estimates to estimate the completed sums, and eventually ends up with a bound on the inner sum of the form

where and ; the insertion of the m factors is the key new improvement here. So now one has to sum

The diagonal case contributes a term of

by repeating the arguments from the previous comment. Now we turn to the off-diagonal contributions . Among other things this forces to divide , which already gives a noticeable gain (and forces to be only as large as ).

Fix . The same computation as the previous comment shows that for each choice of and , the number of pairs with and is . (There are choices for , and for fixed there are choices for .)

So now one is summing

.

One can perform the summation without difficulty and end up with

,

and then performing the summation (which is now improved due to the constraint ) this becomes

.

Collecting terms, it is now

that we wish to be .
Ignoring epsilons, this gives rise to upper and lower bounds on R:

.

So we now have

.

Since has to be at least 1, we need , which also makes all the lower bounds on at most . We then win if

which rearranges to

.

These are stronger than the previous constraint which may now be discarded. Also the last constraint is weaker than and may also be dropped. As , the constraints may be rewritten as

thus

and

In terms of , this gives a constraint

which can be rearranged as

and

.

the first condition dominates, so it looks like the Level 4 Type III constraint is

.

which when combined with the latest Type I and Type II estimates (Level 3 and Level 1c respectively) seems to give the final constraint

which is basically formed by the collision of the constraint and the Level 3 Type I constraint of . Thus the Level 4 Type III estimates are so strong that they are no longer providing the dominant contribution, it is now the clash between Type I and the combinatorial constraint which is the bottleneck; Type III won’t become relevant again until hits 1/70. (This also means that the above analysis will be fairly robust to numerical errors.)

Using this Level 4 Type III analysis, I have an idea [*very* sketchy at the moment] that has the potential to remove any reference to (thus removing Type I computations), unify Type 0 and Type III into a single result, and trivialize Type II computations.

Let us begin as above, but with the new objective to obtain the bound

Proceeding as in the comment, the main term of interest in the LHS is

(*)

Setting , and proceeding as before, the changes are twofold. First, replace by . I believe that this causes no trouble, and can still be handled by the divisor bound. Second, we replace by . Here is where my idea becomes especially sketchy. We need to find a bound on

of some manageable order. To continue, supposing we get something similar to

(which is what was obtained above in the case, possibly here with involving th powers), then the rest of the post seems to follow along similar lines. We still end with something *close* to the two conditions

and

.

These equations (when precisely stated) may now rely on , but for simplicity I’ll ignore that. Fix and restrict to the case . Considering all up to a large (but fixed) size, the only combinatorial case not covered is when all scales are smaller than . If is fine enough we can cover that final case with a Type II analysis (in a significantly shorter interval).

At any rate, it might be interesting to work out what happens when , as that will partially address the current obstruction.

It is worth checking out the numerology for, say, n=4, , (even though this particular case can also be handled by Type II methods) and , to see if we can just beat Bombieri-Vinogradov for in the most favorable case by this method. I think the numerology may become a less favorable though (the normalisation of for instance, should have a factor of in front rather than to reflect the expected square root cancellation). There is also the additional issue that a certain amount of algebraic geometry (ell-adic sheaf cohomology, this sort of thing) is needed to actually justify the expected square root cancellation for sums such as but we can cross that bridge when we come to it.

As said above algebraic geometry is not at all a problem here; my greatest worry as you said is that will get huge if is large so that an important term in Polya-Vinogradov will be the zero frequency: for these one need a weight dropping phenomenon (more than square-root cancellation) for the complete sums, say,

and

where : these sums can probably be evaluated in an elementary way and then their size may determine whether this has a chance for .

Thanks for the pointers. It looks like this idea won’t work out, but it gives me good motivation to understand the arguments better.

If it does turn out that gives a natural barrier, there is yet another way to beat the combinatorial restriction. If we are not in type 0,I,II, then we can still reduce to the case where there exist at least three real numbers in the range . In general, we no longer have that , but we can use the poorer lower bound . Unfortunately, with the current Level 4 Type III bounds, this gives

After a nice discussion over lunch with Philippe Michel (who happens to be in LA currently), I now have a clearer idea of why the Type III argument of FKMN beats Bombieri-Vinogradov for n=3 but just barely fails to do so for n=4.

Let’s start with n=3, taking and for simplicity (so we are just at the border of Bombieri-Vinogradov. Philippe pointed out that it is convenient to write things in terms of powers of Q, thus here. We want to understand the expression

on the average for typical (smooth) (I will suppress the dependence of on here for simplicity). The trivial bound here is , so we want to control this sum with accuracy or better. Using completion of sums, we can rewrite the above expression (morally) as

which (if we discard the cases where vanishes or otherwise has a common factor with ) is basically

(the out the front is absorbed into the normalisation of . Deligne (for the surface ) tells us that , giving us back . So a direct application of Deligne only recovers the trivial bound; but once we do even a little bit of Cauchy-Schwarz and take advantage of the averaging in q we can do better (again using Deligne, but now for a hyper-Kloosterman correlation rather than for a single hyper-Kloosterman sum). Eventually it turns out that we can get as low as (down from the current level of ), which corresponds to a 4/7 level of distribution.

Now we take n=4, in the model case when and . We are now staring at

(before averaging in q). The trivial bound is as before. After completion of sums we get

(note here that completion of sums did not shorten the length of incomplete summation, in contrast to the n=3 case when it reduced a sum of length to the dual length of . With the normalisation of (which now has a in front instead of ), this morally becomes

Deligne still tells us that , but this now gives a net bound of , so we fall short by of even recovering Bombieri-Vinogradov here. We can _barely_ recover this by averaging in q as per FKMN. Namely, we are now trying to prove something like

for some bounded and some divisor-like function . Performing Cauchy-Schwarz in h, this becomes

.

The diagonal term is barely acceptable, and the off-diagonal terms are also barely accceptable using hyper-Kloosterman correlation (if we replace with ). So we can barely recover Bombieri-Vinogradov with basically no room to maneuver. So it looks like we would have to come up with a new trick in order to use this sort of argument to get something non-trivial in the n=4 case.

If we try to Cauchy-Schwarz in more variables, the off-diagonal terms become better, but the diagonal terms become worse; since both the diagonal and off-diagonal terms barely make it to Bombieri-Vinogradov, I don’t see a way to rebalance the Cauchy-Schwarz to do better. A little more precisely, we now write the Bombieri-Vinogradov type estimate to be proven as

with . If we Cauchy-Schwarz in h,r as you suggest we end up with

The diagonal contribution now has size which is now a little bit too big to even get back Bombieri-Vinogradov.

Having said this, though, I think this rebalancing trick may work for the Type I and Type II sums to get a further gain from what we already have (or plan to have). In both the Type I and Type II sums, one is eventually looking at a sum of the form

where is a certain explicit phase and the coefficients are essentially bounded (and never mind what k and r are, it’s not important for this discussion); see equation (39) of https://terrytao.wordpress.com/2013/06/12/estimation-of-the-type-i-and-type-ii-sums/ . In the Type I sum, Zhang performs Cauchy-Schwarz in the variables and finds that the diagonal terms are OK (as long as or equivalently for some small fixed ) and then works hard to control the off-diagonal terms . In the Type II sum, the diagonal terms coming from Cauchy-Schwarz in are no longer OK and so Zhang uses Cauchy-Schwarz in just in order to make the diagonal terms better, at the cost of making the off-diagonal terms worse (but because in Type II, and are much closer to than in Type I, one still can get reasonable estimates here.

But in light of your suggestion, I see that there are other Cauchy-Schwarz’s available: in particular,in Type I we can factor and Cauchy-Schwarz in rather than just in order to make the diagonal terms do a bit more of the work and the off-diagonal terms do a bit less of the work. Similar optimisations are available in the Type II case (indeed one could imagine that one could now treat Type I and Type II in a unified fashion with a continuum of Cauchy-Schwarz options).

I’ll add this idea to the queue of possible Type I/Type II improvements (I’ll dub it “Level 6”). There are still two earlier potential Type I/Type II improvements (which I’ve called “Level 4” and “Level 5”) that should be worked out too; I think I can get a preliminary version of Level 4 worked out soon and will post it here. (It’s sort of a strange situation to have more ideas for progress than time available to work the ideas out properly; it’s almost always the other way around :-).)

As someone who is following the discussion here with some interest, can I suggest that the constraints in current form would be easier to follow and compare when reading posts and tables (eg on the wiki page) if they were normalised to (ie clear fractions)

In this comment I am recording a preliminary “Level 4” estimate for Type I. However I am punting on the issue of exactly how to define “double dense divisibility” (which should roughly capture the notion that a modulus can be factored as with placed more or less wherever one wants) and revert to the older notion of -smoothness. This will worsen the numerology when converting from to (basically, this forces one to take , so that vanishes and is irrelevant) but I think the gain from improving will still be greater than the loss from having to revert to a more primitive sieve. (This is presumably a temporary issue, that can be resolved once we work out what double dense divisibility should be.) Unfortunately I have bad news – due to arithmetic errors in my previous projection at https://terrytao.wordpress.com/2013/06/22/bounding-short-exponential-sums-on-smooth-moduli-via-weyl-differencing/#comment-236039 , this argument ends up being a lot worse than anticipated, and in fact is inferior to the Level 3 estimate.

with squarefree and coprime with -smoooth, then for any other factorisation of , we may use the k=2 van der Corput bound from this comment and bound

if , where , , , and . As is -smooth, we may choose a factorisation with

and

(note that , so ) so that

and thus

.

Using completion of sums to handle the opposite case as in the previous comment, we conclude that

Using this bound, we can improve the RHS in Lemma 10 of this post to

.

The non-diagonal contribution to (50) is then

which needs to be

leading to the conditions

and

.

As and , this becomes

which since , , and for an infinitesimal , leads to the constraints

and

which may be rearranged as

and

.

The second constraint is clearly dominated by the first and so may be dropped. We conclude the “Level 4” bound that the Type I estimates hold for . This is significantly worse than the previous claim of coming from https://terrytao.wordpress.com/2013/06/22/bounding-short-exponential-sums-on-smooth-moduli-via-weyl-differencing/#comment-236039 ; there were two arithmetic errors in the previous analysis, namely that a factor of actually should have been , and also an expression should have been . Playing this estimate off against the combinatorial constraint gives , which is a little worse than what we currently have. The moral here seems to be that the small gain coming from the second van der Corput is not quite strong enough to compensate for the large number of additional factors of one loses when trying to use that estimate.

On the plus side, it means that we don't have to think about double dense divisibility immediately…

I thought we had to be a bit more careful about . Clearly, holds for small enough, but “small enough” really means “depending on “. On the other hand, in deriving Theorem 6 from Theorems 8 and 9, we make a choice of depending on . So without further comment, and depend mutually on each other and the two dependencies might be contradictory. So I thought we had to make sure that for the choice of we make, is sufficiently small relative to it so that holds. This is why I said that assuming (48) any is good, while not assuming (48) any is good. Sorry if I am missing something. I wish you a good trip!

I am beginning to suspect that the Type I/II analysis could be simplified by removing the somewhat intricate arguments Zhang uses to enforce that are coprime. (This would also remove the “controlled multiplicity” hypothesis from the congruence classes, and also remove most of the upper bound constraints on , potentially leading to some improvement in the numerology.) I have to check the details, but I think one of the main reasons Zhang needed coprimality (other than to make certain formulae look nicer) is that his exponential sum estimate on incomplete Kloosterman sums (Lemma 11 from Zhang’s paper) was a bit inefficient in the non-coprime case (his factors of in that Lemma should really be where ). Otherwise, having common factors between the q’s actually _helps_ the analysis by making the overall modulus smaller. Again, I have to check the details…

p.s. I’m leaving for Budapest today, probably will be out of action here for a day or so.

The problem comes from the new factor , which in the non-coprime case depends on the choice of and so cannot actually be dumped into the main term in the non-coprime case. However in that case one can estimate this expression crudely, using , as

and on averaging in k we and using the lower bound we can show that this expression is (barely) acceptable as an error term rather than a main term. So we can’t get rid of the reduction that makes free of tiny primes. On the other hand, by reworking the argument in the above way we can avoid the need to assume controlled multiplicity altogether in Zhang’s theorem, thus strengthening it slightly.

I also now see that the condition is necessary for the dispersion method to work, otherwise the condition basically determines from and there is nothing to disperse!

[…] sketch of Zhang’s argument for establishing Type I estimates (details may be found at these two posts). It is based on previous arguments of Bombieri, Friedlander, and Iwaniec, relying on various […]

I think that Theorem 8 can be improved in that (47) can be omitted. We observe first that (52) can be supplemented by

,

as follows from the explicit formula

Using also , the last term in (51) can be improved to

.

The improvement is really by a factor of , since . By going back to earlier parts of the argument, we see that we need to average this over , and then sum over with . This way we gain a factor of over the original contribution of the last term in (51). Therefore, instead of

,

we only need to ensure

,

that is,

.

In short, the last display of Section 3 can be replaced by

,

which rearranges to

.

So this would replace (47), but it follows from (45), hence can be omitted.

1. In the discussion below Proposition 12, you say that (62) with the appropriate parameters follows from (17). My calculation tells me that instead of (17) we need the slightly stronger bound . More precisely, we want

,

,

,

.

Am I missing something?

2. “values coprime to ” should be “values coprime to “.

3. In (63) and in the display before (64), should be .

[Corrected, thanks. Increasing to causes some changes in the final calculations, but fortunately only for non-dominant inequalities, and the final constraint on remains unchanged. -T.]

2. In the proof of Proposition 12 you remark that (59) implies . I think (59) only implies , so the subsequent bound for needs an extra factor of .

3. In the three display following (71), the condition is missing, and should be .

4. In the second display below (71), a factor is missing on the right hand side.

5. In the third display below (72), the term is missing. Also, for the sake of the reader, it would be better to use the variables instead of in this definition.

6. In the fourth display below (72), the condition is missing.

7. In (74) and in the third display below (77), should be .

8. For the sake of the reader I would add more detail regarding the choice of below (77). There is a constant such that exceeds . If , then there is a divisor such that , and any such divisor is admissible by . Otherwise , hence is admissible by .

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.