Polymath 8b, XI: Finishing up the paper

The main focus is now on writing up the results, with a draft paper close to completion here (with the directory of source files here). Most of the sections are now written up more or less completely, with the exception of the appendix on narrow admissible tuples, which was awaiting the bounds on such tuples to stabilise. There is now also an acknowledgments section (linking to the corresponding page on the wiki, which participants should check to see if their affiliations etc. are posted correctly), and in the final remarks section there is now also some discussion about potential improvements to the bounds. I’ve also added some mention of a recent paper of Banks, Freiberg and Maynard which makes use of some of our results (in particular, that ). On the other hand, the portions of the writeup relating to potential improvements to the MPZ estimates have been commented out, as it appears that one cannot easily obtain the exponential sum estimates required to make those go through. (Perhaps, if there are significant new developments, one could incorporate them into a putative Polymath8c project, although at present I think there’s not much urgency to start over once again.)

Regarding the numerics in Section 7 of the paper, one thing which is missing at present is some links to code in case future readers wish to verify the results; alternatively one could include such code and data into the arXiv submission.

It’s about time to discuss possible journals to submit the paper to. Ken Ono has invited us to submit to his new journal, “Research in the Mathematical Sciences“. Another option would be to submit to the same journal “Algebra & Number Theory” that is currently handling our Polymath8a paper (no news on the submission there, but it is a very long paper), although I think the papers are independent enough that it is not absolutely necessary to place them in the same journal. A third natural choice is “Mathematics of Computation“, though I should note that when the Polymath4 paper was submitted there, the editors required us to use our real names instead of the D.H.J. Polymath pseudonym as it would have messed up their metadata system otherwise. (But I can check with the editor there before submitting to see if there is some workaround now, perhaps their policies have changed.) At present I have no strong preferences regarding journal selection, and would welcome further thoughts and proposals. (It is perhaps best to avoid the journals that I am editor or associate editor of, namely Amer. J. Math, Forum of Mathematics, Analysis & PDE, and Dynamics and PDE, due to conflict of interest (and in the latter two cases, specialisation to a different area of mathematics)).

Corrected, thanks! Aubrey, what are the simplest values of F that you have? Given a choice between getting the largest “M” value and the simplest “F” function, I’d vote for putting the latter in the paper (so long as M stays above 2, of course :-) and mentioning the best value of M we can get in a remark instead.

The simplest I’ve obtained thus far is the one I posted on May 3rd (9:18pm), but in the past few days I’ve been exploring a somewhat more versatile approach that seems on course to reduce the coefficients by a further factor of four or so. I should have this nailed down within a few days.

Here are just a few thoughts about the project that I hope may be useful to others who are considering participation in future polymath projects.

I joined the project for a number of reasons. One is that I like pushing numerical results as much as possible. The idea of getting the best possible prime gap is exciting! Another reason is that Zhang’s work is amazing, and I wanted to understand it as much as possible. This was followed by Maynard’s (and independently Tao’s) multidimensional probabilistically motivated sieve, which is also incredibly interesting to me.

My initial incursions into the project were mainly pointing out minor corrections to the polymath8a paper, and asking questions about some of the blog posts. Once Maynard’s work came out I delved more into the optimization problems.

There were a few things that surprised me about the whole experience. First was the friendliness of the other people, particularly our host Terry! I want to thank everyone for the positive experience I’ve had. A special thanks goes to James who sent me some of his original code, which really got me involved.

Second, a surprisingly large number of mathematicians I know commented in personal communications on the fact that they were “impressed” with my participation. I took this to mean that posting permanent comments to a VERY public blog can be a dangerous business! :-) And I think that this issue is something people should keep in mind before participating–but in my opinion the positives outweigh the negatives. Some of the mistakes I’ve made here would never have seen the light of day in a “normal” collaboration, but I guess my ego can take it.

The third surprise was how much time I devoted to this project. I think that comes from really getting excited about the work. This is also the point at which I want to mention something that was a (mild) concern on my part. I’m a pre-tenure professor, which means that I made the conscious choice going into this project that any time I spent was not necessarily going to be reflected in my tenure file. I knew that even if I contributed enough to the work to be an official “participant” on the acknowledgements (and, at the beginning this was a very big “if”), it wasn’t clear whether this would count as “co-authorship” in the eyes of those on my tenure committee. To me this wasn’t a big deal because I do publish enough to meet the scholarship requirements at my university. Yet, at the same time it really is unclear how participation in this venture should be treated by the mathematical community at large. There are so many levels of collaboration, it can be somewhat confusing where the line is drawn between providing comments vs. being a serious co-author vs. everything inbetween. Also, the polymath projects don’t seem to follow the convention used in other sciences of just listing everyone as co-authors who do get to that level.

So, on the one hand I really like the idea of collaborative mathematics without an eye towards recognition! And honestly, I’m happy to give “D.H.J. Polymath” the credit. On the other hand, I would like to consider myself a co-author on at least part of the paper, and take ownership for my part of the work (and ownership of any mistakes that may contain as well). If I am a co-author, I would also like to list this paper on my website. [So, to sum up, perhaps some extra information here would be helpful about how I should think about ownership!]

The fourth and final surprise was that I feel like I did contribute! Sometimes it was in intuitive ways like waking up one morning and realizing that a construction we were attempting would contradict the parity barrier in sieves. This idea then led the experts to write some really interesting formal mathematics on this issue. Sometimes it was in the time consuming busy-work of writing, running, and debugging code. And sometimes it was just in contributing my experience after thinking about the geometric pictures long enough.

So, overall, I would say that if a polymath project is interesting, then please do participate! But remember that this is not your standard mathematical collaboration.

————–

On the question of where to submit the paper, all of those options seem good.

On the issue of missing code, are we missing things on my end? I’ve already uploaded the k=50 code to dropbox. Is there anything else you would like me to upload?

I’d just like to add that this very accurately reflects my thoughts as well (and is expressed rather better than I could have done). I completely agree with both the positives and concerns.

As to where we publish: I am happy with all the suggested options, and have no strong feelings either way. If I was forced to vote, I suppose I’d lean towards ANT because of the natural links with the previous paper.

Thanks for these thoughts! These would fit very well in the retrospective paper, which would be the next (and perhaps final) component of the Polymath8 project after we’re done with this submission, if you’re happy to have them included in that paper.

Regarding the code, I guess all that’s needed is a link in the paper to the code, which can indeed be a dropbox link (there is the issue of link permanence, but it’s not clear that dropbox links are any more or less permanent than other options).

I would think that the dropbox link would be as permanent as we need, although I’m happy to put a copy on my personal website as well. (I plan to keep my website for the next 30+ years, modulo changes in technology, etc…)

Regarding the retrospective, I’d be happy to include those thoughts as long as I can update them for the new venue.

——

Finally, I want to report on another computation I performed. Over a month ago I decided to try for . I knew it wasn’t nearly as “round” as , but I still had to try for it. It didn’t work out, so we get to keep our nice round numbers, but here are the details.

I got . The choice is probably not optimal, but from numerical experiments I’ve found that small differences from the optimal epsilon only affect the values in the third or fourth digits. As before, I used signatures with only even entries. The degree of the signature was at most 28, while the total degree was at most 30.

I began the computation on April 14. The matrices were done on May 11. The linear algebra was done on May 17. The amount of RAM being used, at least when I looked, was around 11 GB.

I tip my hat to you for trying it out. It might be worth including a comment in the paper detailing this attempt, to let people who might be interested in more computations what they’d be in for, or what they need to try differently.

Terry’s request got me moving, so here’s my latest on the simplest polynomials required to push M above 2 in the GEH case. All this work has been in the context of epsilon=1/4 so as to eliminate the M6 and M8 constraints. To recap, my starting-point was the discovery (noted Feb 11th, 11:39pm) that when we allow these polynomials to include monomials that feature all three variables, which Pace originally excluded, we can get away with a maximum degree set of FA=FB=FC=FG=FU=3, FT=2, FE=FH=FS=1, in contrast to FG=FU=4 required otherwise. This turns out to give rise to much simpler coefficients in the linear combinations of variables that are generated by Pace’s code as the coefficients of the eventual monomials. Imposing the products of appropriate denominators of those coefficients as individually required factors for the variables led to the solution I posted on May 3, which can in fact be further improved by about 15% with a rather more exhaustive search. However, because those denominators share a lot of factors, I wondered whether one might do better still by allowing individual coefficients to be non-integer but imposing constraints to ensure that they cancel out in the expressions for each monomial coefficient. Long story short, that has led to the following solution:

which I don’t claim is absolutely optimal (as defined by my chosen metric of minimising the largest coefficient) but I think it’s pretty close. I may have erred in my changes to Pace’s code, however (or there may be precision issues), so if someone with Maple could verify the value of M before enshrining this solution in the paper I would be grateful.

I was able to obtain improved bounds for the m=4 cases using a modified Schinzel sieve that avoids unnecessary sieving. The program sieves an interval [s,s+d] at 1 mod 2 and 0 mod p for p up to the least p_m such that the set of survivors is admissible modulo p_{m+1}. It then iteratively checks admissibility at larger primes p_n, sieving 0 mod p_n only when necessary, and finally selects a minimal-diameter subsequence of length k0 from the survivors.

The resulting sequence is determined by the values s, d, m and a list of the indices n for which the interval was sieved at 0 mod p_n.

While we’re in the business of maximising and simplifying, it may be worth refining the numbers for M_4,epsilon on page 15 and in section 7.3. Using the formulae on page 59 I’m finding that while 0.784 is indeed the best alpha, 0.18 is a bit high for epsilon: setting it to 0.168 gives I=0.00728…, J=0.00365… and a non-negligibly improved M=2.00558… . (If we alternatively prefer simple fractions that suffice to reach M=2, it works to set alpha to 4/5 and epsilon to 1/6 giving M=1987000/991613.)

I think the asymptotics claimed for the Goldbach/Twin prime dichotomy on Page 69 aren’t quite right; the W-trick isn’t sufficient to eliminate the singular series which arise when or has many moderately-sized prime factors. (I think there should be factors of

on all the right hand sides).

I don’t think I can edit the wiki, but my information isn’t quite correct; I’m currently based at the Universite de Montreal (supported by a CRM-ISM fellowship); its only in September I’ll be returning to Oxford.

Now seems like the right time to sift through the (over 1000!) posts in the project to see whether any loose ends exist that are worthy of explicit/expanded treatment in the “Additional Remarks” section, even if not further exploration in advance of finalising the paper. (If I had to find any aspect of this project that was in any way other than enormous fun, I guess I would cite the fact that promising-sounding ideas were rather too often raised and then not explicitly either explored or rejected, possibly (it seemed) as a result of sheer lack of manpower; perhaps a refinement for the future would be to maintain a separate page, analogous to the “world records” page or the timeline page, in which such ideas are held in a kind of purgatory for (new, especially) participants to explore.)

So, well, I certainly haven’t done a thorough scan, but one particular idea does seem to stand out – it was raised by Terry in https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-258264 and discussed optimistically over the next 24 hours, but then it faded away following Pace’s remark that “On my end, the obstacle is that even for moderately sized k the polynomial is hard to work with (especially when multiplying two such polynomials together and integrating)“. I for one am still curious as to whether this is a genuine showstopper, seeing as how the expansion of M to M” is really the only game in town for substantially improving on H_1=246 at present. As a participant whose relevant skills are very much on the algorithm design side rather than the maths, I can say that this is exactly the sort of thing I’d be keen to pursue if I knew how, i.e. if the challenge were reduced to one of algorithm optimisation for a well-specified problem.

1. The (already mentioned) idea to formulate the optimization problem for using polynomial optimization over the reduced support of the optimal (known to vanish over an inner simplex) – which should increase the lower bound (with improved convergence rate.)

2. The idea to enlarge the sieve support by formulating the necessary vanishing conditions in terms of itself (not via its support) – this is a kind of “free boundary support”.

For what its worth, my feeling is that to get a noticeable improvement in the bound on prime gaps with the same overall approach, one should throw ones weight behind getting more flexible MPZ estimates.

The hope would be then that they could reduce the ‘true’ value of k (i.e. ignoring practical optimization difficulties) to something in the region of 20 (whereas I’d guess the ‘true’ value of k for the current optimization problem is probably not too far below the current 50). In this regime of k=20-25, it might become feasible to do actually do numerical computations taking into account (some of) the relaxed support conditions/piecewise polynomials, and so it might be possible to perform calculations which are out of the question when k=50.

One aim would be to extend Zhang/polymath 8a to less smooth moduli. The ‘traditional’ Kloostermania techniques work with prime moduli (since then the residue class is fixed), and maybe utilising similar ideas one could relax the smoothness requirements. Alternatively, it seems possible that the Polymath 8a arguments might be re-done with the aim of having a less stringent factorization property (rather than the best exponent).

It is already the case that some improvements on the Polymath 8a numerology have been suggested by utilising extra averaging (although there are complications in the algebraic geometry). Phillipe Michel’s post on the last thread indicated there might be further possibilities here.

Finally it might be possible to look for variants of Zhang’s method where one is only considering ‘most’ or a positive proportion of smooth moduli, or where instead of showing asymptotics for primes in such an arithmetic progression one gets lower/upper bounds which are off by a small constant factor. Using ideas connected to the Harman sieve this has been successful in extending the fixed residue class results, for example.

This is all rather speculative (I don’t know how to do any of the above, and I’m skeptical that what I describe would actually turn out to be practical to implement numerically), but this is where I would guess one should go to get a bound less than 100, say, without using some completely new idea.

Many thanks James – I for one did not appreciate that there are so many directions in which adaptations of MPZ might plausibly go. Perhaps it’s worth including this summary as an expansion of item 4 in the “Additional Remarks” list?

As for numerical computations taking into account relaxed support conditions and/or piecewise polynomials, actually the impression I got via the k=3 polynomial simplification work is that there is still quite a bit of hope. One concern I have is that (as I understand it) exploring only polynomials in which no powers of (1-P_1) are included, i.e. setting zstart=0 in the program, invalidates the assertion that avoiding any 1’s in the signatures still leaves a basis for all symmetric polynomials. I don’t have a strong feel for how much is lost as a result, but my recent exploration (https://terrytao.wordpress.com/2014/04/14/polymath8b-x-writing-the-paper-and-chasing-down-loose-ends/#comment-326155) of non-zero zstart for k=3 and 4 suggests that it could be non-trivial. My approach going forward would be to enhance the latest k=50 code to start with lowish degree and control the size of the matrices by iteratively removing variables that minimally change M, while gradually increasing the allowed degree, as a way of finding a truly optimum (or nearly so) set of signatures to use (rather than just avoiding odd-numbered powers and setting zstart=0). Perhaps this could be sped up by initially calculating the matrix entries approximately as real numbers rather than as exact rationals, and only going back to the exact computation once a promisingly good and small subset of variables has been identified. However, this is all academic at present because any such approach would need as a starting-point a k-independent description of an admissible partitioning of R”_k (or better, R”_k,epsilon) into a manageable number of components.

Correction: I did not mean to write “no powers of (1-P_1) are included” when zstart=0, but rather that the choice of how many to include seems too restrictive to maintain coverage of all symmetric polynomials. Please correct me if I’m wring about that.

I’ve paraphrased some of this nice discussion into the remarks section. It’s true that we don’t have dedicated efforts to try to highlight all possible loose ends, but this is indeed a manpower issue; Polymath projects are technically “massively collaborative” in nature, but in practice, even the successful ones are basically just ten or fewer people volunteering a fraction of their time on a given problem, plus maybe a dozen or more people making smaller contributions. But in principle, everything is archived in the blog discussions, so part of a possible future Polymath8c project could well begin by scouring the Polymath8a and Polymath8b posts for previously discarded ideas which may suddenly become relevant in light of some unexpected new development.

Note that we tried a similar exercise at the end of Polymath8a, collecting possible ways in which our 4680 bound was to be improved, but most of these turned out to quickly become obsolete with Maynard’s paper, so perhaps it is in fact more efficient to just let things settle for now and await more substantial advances in prime gap technology.

James – you mentioned some applications of the Harman sieve tailored to a positive proportion of moduli, rather than all moduli – do you have a reference for this? The only example I know of this sort of strategy is to get around the ineffectivity of the Bombieri-Vinogradov theorem by simply ignoring the exceptional modulus when building the sieve weight , but your description suggests that you are talking about something slightly different.

I had in mind ideas similar to those in Fouvry’s 1985 Inventiones paper ‘Théorème de Brun-Titchmarsh: application au théorème de Fermat’ (or the paper of Baker and Harman `The Brun-Titchmarsh theorem on average’ extending Fouvry’s work). In both of these they show for almost all of size one has (for fixed )
for constants which satisfy and where can be larger than . I'm far from familiar with the details of these papers (so I might well be saying something stupid) but I think the idea is to use similar estimates as BFI, but to satisfy oneself for bounds rather than asymptotics in exchange for being able to use more moduli . One obtains bounds since you have to discard (and bound trivially) certain terms in the decompositions of which are difficult to analyse. (The third BFI paper is similar, applying to a positive proportion of moduli).

My hope was that if you could have bounds of the above type for a positive proportion of moduli , but without requiring to be fixed (incorporating Zhang-style ideas instead of BFI), then this might translate into a useable gain in the sieve (at the expense of yet more complication).

I should reiterate that this should all be thought of as fairly wild speculation on my part.

I have filled in Appendix A with details on how the H(k) bounds listed in Theorem 3.3 were obtained. The bounds in Theorem 3.3 should be updated to match those listed in Appendix A (I didn’t make this change in case someone is currently editing subtheorems.tex)

There are still two entries to be added to Table 4 for k=75845707 and k=3473955908; these depend on computations that are still in progress but which should finish over the weekend.

Let us imagine we are entering a train station and instead of taking the train, we’re flying by a helicopter and viewing the railway from a bird’s eye. From the helicopter, we see the starting point of the railway. The end can not be seen. Our railway is endless. Sitting in the helicopter, we can change the color of the railroad stripes as we wish.

First, we shall paint with red color every second stripe.
Thereafter we shall paint with yellow color every third stripe that is not painted red.
Thereafter we shall paint with blue color every fifth stripe that is not painted red or yellow.
Thereafter we shall paint with green color every seventh stripe that is not painted red or yellow or blue.
And so on according to the following prime numbers.

The question about the existence of an infinite number of twin primes, can be illustrated as follows: a coloring stage (coloring stage n) leaves behind an infinite number of pairs of stripes unpainted, which are separated by one red stripe. Is it possible to determine that the next coloring step (coloring stage n +1) will fail to “eliminate” these pairs ?

If the number of twin primes is finite, then there is a coloring stage which “eliminates” the pairs of white stripes which are left by its predecessor (from a certain point on the number line). Thus, if it appears that each coloring stage – no matter how advanced it is – leaves behind an infinite number of pairs of white stripes, then we can assume the existence of an infinite number of pairs of twin primes.

A small postscript to the k=49 question. I spent yesterday streamlining Pace’s k=50 code and achieved a pretty nice speedup (over an order of magnitude for degrees in the high teens), and I’ve emailed it to Pace in case he wants to experiment with it on his high-performance hardware; I also uploaded it to the dropbox. Meanwhile, however, using my own computer I’ve explored k=49 for degree up to 19 (i.e. up to 9 for the parameter “d” in Pace’s code), finding the best epsilon (among reciprocals of integers) in each case, and the results are as follows:

from which it can be seen that the rise in M with d decays extremely close to exponentially (with an exponent of about 1.444), leading to a predicted M at large degree around 3.997. (By way of confirmation, Pace’s value of 3.9885 for d=13 also falls accurately on this curve.) This is so close to the goal that I suspect the thing to try next is relaxing the restriction that signatures be composed only of even numbers. I hope to find time to look at that soon.

I took a look at Aubrey’s improvements and they are very nice. Essentially, any computation that has been done previously is stored in a table, including the simple things like the creation of binomial coefficients. By making sure we don’t repeat all of these little computations, it adds up to a big time savings!

I’ve actually shaved another factor of two this morning – things that seem really trivial, like set manipulation without any numerical calculation, are now significant when they’re in inner loops. Meanwhile Pace just sent me a version of his code that allows signatures to contain all numbers except 1, rather than only even ones, and in the next few days I’ll try to make the equivalent mods to that code so that we have a shot at k=49.

Saying that, however, I’m well aware that the eigenvector calculation now dominates in terms of CPU time, and maybe also in terms of memory needs, such that matrices of sufficient size to get us to k=49 may still be out of reach. If anyone has ideas for other ways to compute M from M1 and M2, now seems like the time to float them.

1. Do you really need the matrices explicitly in memory?
2. Wouldn’t it suffice to only use matrix-vector-products?
3. If your matrices are symmetric what could you say about their
L.D.L^T decomposition?
4. Is it possible to factor the L matrices again?

Another idea is to use the fact that for each given , if and only if the matrix is positive definite (which is much easier to check – e.g. by checking if it has a Cholesky factorization – which is much faster than the eigenvalues algorithm.)
So by trying a sequence of -values (generated by the bisection method), the true M-value can be approximated efficiently by the generated sequence of upper and lower bounds.

This sounds very promising – but maybe I don’t fully understand – surely for our purposes we know that C=4 is the interesting value, so we’re done if we find that 4M_1-M_2 has no such factorization – no bisection required? (Actually, is this factorisation the same as Andre’s item 3?)

Aubrey, you are right (of course), but in order to optimize , a sufficiently good estimate (e.g. to 4-5 significant digits) of the M-value (for each fixed ) is needed – especially because even with optimal , the asymptotic M-value for large degrees seems to be very close to 4.

Remark: since the Cholesky factorization (see the wikipedia article) is very fast (perhaps 1000 times faster than the eigenvalue algorithm) and numerically stable, it seems to be ideal to test positive definiteness for large matrices. (The required number of bisection iterations for 4-5 digits accuracy for the M-value is about 40-50 – i.e. about 40-50 Cholesky factorizations trials).

Ah yes – fair enough. OK. It seems that this factorisation is not built into Mathematica; if someone knows differently, or has an implementation handy, please say, but otherwise I will have a go at implementing it once I have my speedups working for Pace’s unrestricted-signature code.

Eytan, is it fast to show that is positive definite while using rational entries (rather than approximate real entries)? Otherwise, we would need to run at least one generalized eigenvalue routine, in order to get an exact arithmetic proof of what we want.

It does turn out that Mathematica has a test for positive definiteness (PositiveDefiniteMatrixQ) and a Cholesky factorization routine (CholeskyDecomposition).

Pace, since the Cholesky algorithm requires n square roots (for a matrix with size n), it can’t directly applied with exact arithmetic (in which case it needs modification to the closely related factorization which is twice slower). Anyway, I think that this positive definiteness check should be used (via bisection iteration) only to approximate the M-value (for fixed ) to a sufficient accuracy in order to optimize .
For each given , an exact arithmetic test for positive definiteness is possible via the (somewhat slower) factorization (but it is recommended only when the approximated is somewhat larger than 4).

This all sounds most encouraging (to a mathematics neophyte such as me, anyway!). Pace’s (and my) prior work shows that epsilon won’t need to be optimised very well at all unless we get reeeally unlucky with how close to 4 the k=49 numbers come out, so it looks as though the positive-definite test does all we need – we don’t really need the doubling of speed given by the Cholesky alternative. I’ll fold this into my streamlined code tonight or tomorrow and ship it to Pace to grind for large degrees (presuming that the restoration of odd numbers to signatures seems to be giving a sufficient-looking improvement)..

One thought I had was that if the numerics for the ratio J/I when k=49 get really, really close to 4 (e.g. 3.999) then we can then take the optimal polynomial F for this problem, and (following an old suggestion of Eytan) remove the portion of F in the region to reduce I(F) without affecting J(F), thus increasing the ratio a bit and maybe pushing us over the top. The trick of course is to be able to integrate F in this region. One amusing possibility is to first use Monte Carlo integration to get an estimate of how much we would be removing from J(F) by this method – specifically, to randomly sample in the simplex (this can be done for instance by taking k+1 independent exponential variables , dividing by the sum , multiplying by , then discarding the final variable), rejecting all those that lie outside of the polytope being removed, and use the remainder to approximate the integral of F^2 on the polytope (one can also compute I(F) by the same method, leading to a double-check of the numerics).

Actually evaluating the integral rigorously here could be tricky (a 49-dimensional integral of some moderate degree polynomial on a polytope cut out by about 100 equations) but perhaps not impossible (and might be easier than building the entire M_1, M_2 matrix to deal with removing this piece. With epsilon around 1/30, I think there is some chance that this polytope has a reasonable size relative to the full simplex (note that each t_i is expected to be about 1/k on the average). If it is too small, one can try removing a larger portion of F, but then one also has to start considering the impact on the J(F) functional.

In fact (as explained in a previous comment), the rigorous computation of over the reduced support is simplified by using the fact that the complement (over which ) is a simplex, so we need only to subtract from the original . The needed integral over the simplex should be quite simple to evaluate using the new integration variables

with the appropriate (constant) Jacobian.

It follows that the simplex in terms of the new variables is
simply times – so the required integration over this simplex (using the new variables) should be simple.

I’m trying to better understand the state of play in the setting of EH (as opposed to GEH). In the paper (page 3) it is noted that “the parity obstruction does not exclude the possibility that one could achieve (xii) just assuming EH rather than GEH, by some further refinement of the sieve-theoretic arguments”. However, I am curious as to how this can be reconciled with the upper bound of (k log k)/(k-1) for M_k and the need to resort to GEH in order to let the support of F extend beyond R_k. I tried finding references to this in past threads, and all I unearthed was Terry’s note that “it *may* be possible to rely just on EH using Bombieri’s asymptotic sieve, but this has not been attempted.” (the last post of thread 4). I don’t think there has been any other discussion of this; moreover I see that the paper’s references 4 and 11 seem to be about it but do not seem to be actually referred to anywhere in the main text. Might it be appropriate to expand the comment on page 3 a little? – at least mentioning those references, and maybe also briefly summarising why that sieve is a (or the only) potential way forward? – for example, whether (and, briefly, how) it removes the (k log k)/(k-1) barrier, or the R_k barrier, or both?

I added a short clarification. Basically, it is true that with our current sieving estimates (Theorems 3.5 and 3.6) one cannot go outside the simplex just on EH. However, it is conceivable that one could find an improvement of Theorem 3.6(ii) which works just on EH rather than GEH, in which case much or all of our results that currently rely on GEH would instead just need EH.

I’ve just uploaded to the dropbox a new version of the speed-optimised code. This version includes the additional optimisations I introduced yesterday, and also a binary chop to find M by checking positive-definiteness (though I left the eigenvector calculation in too, for validation). Additionally, and most importantly for k=49 purposes, it is based on the version of Pace’s code that allows any number other than 1 in signatures, rather than only even numbers; I’ve modified it so that the allowed set is another global variable, since some odd numbers are presumably more influential than others and a bit of experimentation may thus be in order. A quick test using low degree suggests that allowing all numbers takes only a few times longer to find a given M than the evens-only code, so it should be usable for high degree by Pace on his monster machine unless memory constraints arise. However, the switch to checking positive-definiteness means that the matrix-building is now very much the dominant time sink again, so depending on how Pace gets on I may look for some more sophisticated speed-ups in the coming days.

While Pace’s behemoth does its worst on k=49, I hereby somewhat opportunistically claim a more esoteric world record. A month ago Terry observed (https://terrytao.wordpress.com/2014/04/14/polymath8b-x-writing-the-paper-and-chasing-down-loose-ends/#comment-302077) that the bounds for H_m on GEH could probably be improved by using non-zero epsilon, just that epsilon cannot exceed 1/(k-1). I now find that the quite surprisingly low d=19 suffices to push M_52,1/51 over 4, thus reducing H_2 on GEH to 254. This took only minutes to demonstrate on my puny laptop, so I’m almost certain that k=51 (and maybe even 50) is also possible courtesy of serious hardware.

Update update: I’m now pretty sure that k=50, epsilon=1/49 cannot reach M=4 whatever the degree. The highest I’ve computed is d=27 (even numbers only in signatures), which gives 3.99402, but the rise in M with increasing d decays a bit faster than exponentially, leading to a predicted asymptote around 3.997. In related news, Pace and I have worked a little on k=49 with signatures unrestricted (except for the harmless exclusion of 1’s); d=30 is probably not feasible even with my speedups, but anyway it’s looking as though the asymptote (with epsilon optimised, of course) here is also around 3.997. Pace will doubtless give more precise details in due course.

I was able to implement a more efficient parallel version of the shifted greedy sieve that gives significantly better results for the m=4 cases. It uses a heuristically determined shift parameter s=-(k*log k-k)/2 that for large k is not far from the optimal choice of s in [-k log k, k log k]. For example, when k=3,500,000 this approach yields a diameter of 55,264,162, versus the 55,233,744 diameter listed on the wiki (for comparison the shifted Schinzel sieve gives a diameter of 56,789,070). The key point is that it now takes about half a minute to obtain the first result (on a 48 core machine), whereas the second result originally took several days (it was single-threaded, less efficiently implemented, and spent a lot of time optimizing s).

Unfortunately, I think that it will be too time-consuming to use greedy sieving for the m=5 cases (each of the examples above used 10-20 CPU days of computation, and the algorithm scales quasi-quadratically).

However, I was able to (provisionally) improve the H(k) bound for k=3,473,955,908 using a modified Schinzel sieve that avoids sieving at primes below the sieve bound at which the sequence is already admissible.

Sorry for this non-expert question in advance: You have proven that H_1<=6 holds under the assumption of the generalized EH conjecture. Now, is it sufficient to show H_1<=6 unconditionally with a different method in order to prove the GEH? Or is the GEH strictly stronger than H_1<=6? What other immediate consequences would a proof of the GEH have? Thank you!

No; our argument uses only a few special cases of GEH (including EH), and there may well be other ways to prove the bound that avoid even this special case. I would imagine that GEH would also improve the numerology on other sieve-theoretic results in the literature, although I don’t know offhand of an application which is dramatic as for . One example I am fond of, though, is an old result of Bombieri which shows that while EH is not quite enough to settle the twin prime conjecture, it is enough to do things like link the number of twin primes up to some threshold x, with the (appropriately weighted) number of twin semiprimes (products of exactly two primes) up to the same threshold x; roughly speaking, if there were only finitely many twin primes, then on EH this would also imply that there would be surprisingly few twin semiprimes as well. Ben Green and I have toyed with this in the past as a possible route around the parity barrier to the twin prime conjecture, but didn’t get very far with this idea, unfortunately.

No, twin semiprimes are , (i.e. not necessarily distinct prime factors, and a gap of 2). Semiprimes are A001358 in the OEIS. See also theorem 3 in the paper by Goldston, Graham, Pintz & Yildirim in Trans. Amer. Math. Soc. 361 (2009), 5285-5330 (preprint being arXiv:math/0609615).

In page 44, in the line above the expression for , the meaning of “largest available value of ” is not sufficiently clear. Also, the statement about value is still not justified (e.g. by generalizing lemma 6.1 and corollary 6.2 for this case.)

The generalized version of corollary 6.2 should imply that is the unique(!) eigenvalue of – but this contradicts the existence of more than one eigenvalue in the first regime of !

[I believe what is going on is that the eigenfunction is non-negative only for the maximal value of (much as how only the ground state eigenfunction for, say, a Schrodinger operator, is non-negative, while the others have sign changes. -T.]

This seems to be the only way to avoid the above contradiction!
BTW, the existence of a unique eigenvalue with a non-negative eigenfunction for with general , may be proved via Birkhoff theorem, (see e.g. theorem 2.11 in
arxiv.org/abs/1304.7921 by Lemmens and Nussbaum).

Here are some more details on the M_51,1/50 case, for inclusion at the end of section 7.2. The lower bound for d=22 (which should also be entered in Theorem 3.13) is 4.00156. This was computed allowing only even numbers in signatures. Pace should confirm my understanding, but I believe that setting d=22 means that deg(alpha) is at most 22 and a+deg(alpha) is at most 23, since I started from Pace’s unrestricted-signatures code.

I have another “negative result” postscript to the medium-k explorations. It seemed interesting to know whether we can improve on the strategy of restricting signatures to contain only even numbers, so I tried a bunch of experiments restricting in other ways – using different subsets of [2,d], limiting signature length, and limiting the frequency with which a given number appears in a signature. The upshot is that the even-numbers restriction is quite uncannily superior in efficiency: nothing comes close in terms of the M achieved for a given size of the matrices (though there’s also the possibility that a more esoteric choice would do better).

However, the good news is that we seem now to have enough data on the decay of the rate of rise of M with d to be able to predict the asymptotic value of M, assuming no signature restriction at all, to three decimal places. That’s enough that AFAIK we no longer have any settings of interest whose M=4 status remain uncertain pending values of d that have yet to be reached: in particular, though Pace’s machine is still cranking on k=49, it seems increasingly clear that the benefit of making signatures unrestricted when is not enough (with large d) to get us over the line.

I’ve just uploaded a new version of the medium-k code (with filename Mfinder-2014-06-01.nb) to the dropbox; it incorporates various new speedups. Concerning linking to medium-k code from the paper, I guess there is a dilemma in that the most useful code for others to use is probably my most recent upload, since it’s the fastest, but the results included in the paper were nearly all obtained using earlier versions. (For example, I replicated the headline k=50 result on my little laptop in only 90 minutes. I actually found that we don’t need d=26 – we get to M=4.00124 with d=25.) Maybe the solution is just to delete the various mentions of execution times?

Also, should the paper mention the computational superiority of iterative testing for positive definiteness versus calculating the largest eigenvector? That’s what my latest upload uses, amd the other speedups mean that if the eigenvector method is used then the statement in section 7.1 that the matrix construction is the most computationally intensive step is not actually true.

“(For example, I replicated the headline k=50 result on my little laptop in only 90 minutes. I actually found that we don’t need d=26 – we get to M=4.00124 with d=25.”

Holy cow! That’s incredible! I mean, perhaps there’s just some sort of natural threshold that makes k=49 so much harder, but it seems like _some_ sort of answer for that case should arrive in quicker than the weeks it took Pace originally. For my own curiosity, when did he start cranking with your new code?

It’s not so incredible – just a lot of quite modest speedups multiplied together. My k=49 experiments have been equally fast – I checked d=27 in only four hours (the result was M=3.99402).

As for Pace’s recent experiments, actually he has only had the latest code for a day – he started with my previous version, which was much slower for large d. Also, remember that the current effort is with unrestricted signatures, which involves much larger matrices for a given d than with the evens-only restriction and thus much longer execution times. And also, I have actually encouraged Pace to look at extending the code to investigate R’_k, to which there do not seem to be any overwhelming obstacles (for someone who, unlike me, understands the mathematics well enough!), and which can be expected to reduce k by at least three or four. But I’m sure he will provide an update soon.

Yes, that’s the comparison – except that I used d=25, whereas I believe Pace was using code that only explored d even so he needed 26. Also he was using a faster machine (I don’t know how much faster).

There were a few factors that actually slowed down my computation over the weekend. First, I had a previous version of Aubrey’s code, which had slowed down the creation of the look-up set for the structure constants. (Aubrey has subsequently sped this step back up.) Second, I was looking for 7 digit accuracy, so had to run many more instances of the linear algebra step. Finally, I unfortunately made a small error (typing 99/24 instead of 99/25 at one point) which meant all of the data from Saturday on was incorrect.

But anyway, here is the data I do have in the case. (With the more recent optimizations that Aubrey made, this list could easily be extended up to d=27 or so.) The following is the list of triples where the first entry is the degree, the second is the optimal epsilon (of the form 1/integer), and the third entry is the corresponding M-value.

I feel a little guilty that I didn’t start looking for these optimisations a couple of months ago, but better late than never I guess. If Pace (or James, or anyone else) decides to extend/adapt existing code to look at larger regions (R’, R” etc) then they are welcome to start from Pace’s original code, which is probably a great deal more comprehensible than mine in terms of how it implements the maths, and I’ll be happy to implement these speedups on whatever they come up with.

Yes (for practical purposes) – both Pace and I have found, for k and d in the ranges of interest, that if epsilon is chosen from 1/n then the M’s for the choices either side of the optimum only ever fall short of the optimum M by less than 0.001, so that there seems to be no need to check intermediate values of epsilon unless we get to M=3.999 using only 1/n, which isn’t happening for k=49.

Right – it turns out, as a curiosity of the implementation, that there is a bit of “wobble” away from the trend such that odd d gives M a little above the trend line and even d a little below. But smoothing that out by comparing d with d-2 gives a much more decisive answer – the ratio varies very little from 0.833.

Since the predicted M-value limit is quite close to 4, one needs, in addition to the prediction, to have some (good) estimate of the prediction error (e.g. by estimating a kind of “standard deviation” for the error.)

Aubrey, I’m going to be busy over the next few days (unfortunately), working from home (without access to my computer). Also, I didn’t have to run your new code on the high memory computer, because those issues were fixed when we changed the linear algebra step. Given the current blindingly fast run-times, I’d say don’t wait for me, and go ahead and extend the tables. I’ll try to help out when I get back to my office, if these issues haven’t been resolved before then.

From table 2 it is still possible that (currently it is known to be “very close” to 4) – which should improve theorems 3.9(vii), 3.2(vii) and 1.4(vii) ( value under EH).
Perhaps the fast convergence of the Krylov subspace method is sufficient to show that.
There is also a simple trick to accelerate the convergence rate for this method by choosing
where is a scalar chosen to optimize the convergence rate.
The idea is that if has as a simple eigenvalue and its remaining spectrum is contained in a closed interval with , than the convergence rate of the Krylov subspace method (using powers of ) is upper bounded by , provided that.
Clearly, the optimal is given by , improving the bound on the convergence rate for the current method (with ).
The parameter may be empirically optimized (e.g. by first trying two different values for and inferring from the observed convergence rates the estimated values of and and than taking .)

A short while ago, I used the Shanks transform on , with the size of the Krylov subspace, to try and speed up the convergence of this sequence. From the third (consecutive) application of the Shanks transform onwards, the highest- results were consistently above 4. However, the results were usually around to , so it is definitely dangerous to put much faith in these data points. It does seem likely, though, that if is indeed true, it will require a considerable polynomial degree to attain.

The acceleration trick described above should definitely accelerate the power method. Unfortunately, when using the full Krylov subspace in the eigenvalue computation (which was done to get the lower bounds for ), it looks like the shift with does not yield a gain because both and should generate the same Krylov subspace…

It is possible to use a partial Krylov subspace by taking only a fixed number of the last generated basis functions (the case corresponds to the power method).
This truncated version of the Krylov method should give almost the same M-value as the original method (provided that is sufficiently large for the desired precision – depending of course on the convergence rate with optimized ) – while keeping the matrices of fixed size .
Therefore it is important to estimate the (c-optimized) convergence rate in order to estimate the value of and the maximal polynomial degree needed for the desired precision for estimate.

For example, with convergence rate 0.833, about 50 iterations are needed to improve the accuracy by a factor of .

Additional (and probably more promising) generalization is to apply the Krylov subspace method to estimate the norm of where is the “spectral shift parameter” described above, and is a positive integer. The generated basis functions are therefore

For example, if this gives a lower bound for where the matrices are sub-matrices of these matrices for but with improved convergence rate (the square of the convergence rate for ) and improved lower bound for (the square root of the lower bound for the norm of ) – with relative error about half of the relative error for lower bound.

Yes, it’s extremely close. I ran the code overnight with d=27 and signatures having only even numbers, and I get a lower bound for M_53,0 of 3.99342. The rate of convergence is such that I’d expect high d to get to 3.997, or maybe 3.998. Then there’s the extra slice that comes from not restricting signatures, which is only 0.005 for d=16 and gradually shrinks as d rises, but which I think may be around 0.001 or 0.002 in the d=30-35 range. I’m at my computer’s limit, however – the d=27 run took all night, with each positive-definite test taking half an hour – so we’ll need Pace (or someone else with a really fast machine) to look at it, and even that may not suffice to decide this. I hope the new Krylov tricks can do better.

Eytan: I now took a closer look at the spectrum of . When the domain is limited to the Krylov subspace (the sizes were tested), I get what seems like an eigenvalue at , and a continuous spectrum in the range . I am unable to prove that the spectrum is empty between and , but I did manage to sketch a proof that is indeed in the approximate point spectrum of . I have updated the KrylovMk.pdf file to include this (and the graph of the spectra of ).

Your proof starts by (the still unproved) assumption that is an eigenvalue. I think (as mentioned in a previous comment) that this assumption may be proved by Birkhoff theorem.
But suppose the numerical evidence that there is a “spectral gap”, than the power method for , with the “spectral shift parameter” should converge to with a convergence rate bounded by

Concerning the remark below theorem 1.3, it seems (for sufficiently large , and perhaps even for all ) that is the first for which is bounded by the explicit bound found for (since is asymptotic even to the conjectural ).

Eytan’s reminder concerning M_53,0 alerted me to the need to wrap up the details on the M_54,0 computation. As with M_50,1/25 I find that we can get over the line using one degree fewer than Pace originally reported: d=23 using evens-only signatures gives M=4.00238.

Ah – sorry, looking back to Pace’s original post (https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-261984) I see that he in fact reported success with d=22 for signatures (and total degree 23), contrary to the note just above the Krylov discussion in the current draft. However, I just checked d=22 and I get an M of only 3.9991. This could be worth looking closely at, because it may be real: Pace mentioned that because of the memory issues that existed back then he dropped the last three signatures, and it seems weird that dropping signatures could allow M to rise but I actually found the same thing here and there when I was trying to improve on using even numbers only. The dream scenario is that removing a few signatures will enable M_53,0 or even M_49,1/24 or M_50,1/49. I’ll explore more.

Well, an example where I’m finding this is as follows: with k=50, d=12, epsilon=1/26, I get 3.881 with just even numbers (giving 30 signatures) and adding in just the signature {11} lowers M to 3.844. I’ll go back and try it on the older code, but I actually used the older code when I got these results first time, so I’m pretty sure that’s not it.

Just to confirm, I get the same numbers using the code you emailed me on May 27th, modified only to restrict the numbers allowed in signatures to a specific set, namely {2,4,6,8,10,12} versus {2,4,6,8,10,11,12}.

Unless you changed the code in some strange way, this shouldn’t be happening. Adding new allowable signatures can only improve the M values–because in the optimization process the coefficicents of the new monomials could all be set to 0.

Since all the needed information is contained in the matrices , it is important to check their construction. (e.g. to check if the removal of a signature is actually causing the removal of the associated subset of rows and columns from ).

Eytan: yes. Pace and I are working to identify the problem and I expect we’ll have a resolution very soon. The phenomenon is occurring with all versions of the code and can be demonstrated with quite small k and d, so the analysis is relatively manageable.

I’ve located the source of the error, which occurs in a modification Aubrey made to my code to limit signatures. This led to an incorrect ordering of signatures, and subsequently, a different set of polynomials on which the integrals were performed. The good news is that the M values obtained this way are still lower bounds– it’s just that they are not the lower bounds one would get using the appropriate polynomials with the stated degrees.

This problem should be easily rectifiable, and the correct numerics short-coming.

I’ve uploaded a corrected version of the code to the dropbox. Actually it was somewhat fortuitous that the bug didn’t affect the cases of main interest, i.e. signatures unrestricted or evens-only, because the key requirement that permitted signatures be sorted by their total size in various places was never being explicitly enforced in any version of the code – but it is now, so arbitrarily exotic signature choices can now be explored reliably.

I don’t want to make a nuisance of myself, but perhaps rigorizing the approach sketched in http://mathoverflow.net/questions/132973/would-the-following-conjectures-imply-lim-inf-n-to-inftyp-nk-p-n-ok-lo, given that the quantity r0(n) can be defined unconditionally (i.e without assuming Goldbach’s conjecture) and that there is some evidence that r0(n)=O(log² n), hence supporting NFPR conjecture, could be useful to obtain better bounds for the quantity H_m. Indeed I managed under these conjectures to get the desired form for the asymptotics of H_m by neglecting the divergent error terms involved in the relevant expressions of p(n+k)-p(n). I don’t know if this procedure is valid though, could you please tell me more about it? Thank you in advance.

Actually, can be defined unconditionally as the smallest potential typical primality radius of . This notion is precisely defined in http://mathoverflow.net/questions/61842/about-goldbachs-conjecture but as it is quite long to state precisely, I gave the conditional definition (hence assuming Goldbach’s conjecture) in the question of the previous link.

For avoidance of confusion, and after some illumination from Pace, I just want to clear up one detail arising from the past few days of back-and forth, namely the question of what maximum degrees for signatures and for overall polynomials have been in force in the various cases I’ve been running. My code is based on a version of Pace’s that predates the restriction to using only even numbers in signatures, and which, contrary to my prior assumption, does not automatically have maximum total degree exceeding maximum signature degree – though of course it does do so when the specified maximum total degree is odd. Thus, concerning a few of my reports from recent days:

– 4.00124 for M_50,1/25 is with max total degree 25, max signature degree 24 – so actually TWO degrees less than Pace’s original report of 4.0043, not one as I stated earlier. Note also that in the paper the value 4.0043 is said to have used max total degree 28, but actually it was 27.

– 4.00156 for M_51,1/50 is actually with max total degree 22, same as the max signature degree, contrary to my speculation on June 1st.

– 3.99402 for M_50,1/49 and 3.99342 for M_53,0 are with max total degree 27, max signature degree 26.

– 4.00238 for M_54,0 is with max total degree 23 (not 24 as it currently says in the paper), max signature degree 22, i.e. the same as Pace’s original report, but using all 195 signatures which he had been unable to do. Following clarification from Pace about exactly which three signatures he dropped back then, I now confirm his value of 4.001115 with those same maximum degrees when those signatures are dropped. The surprising value 3.9991 for M_54,0 that I previously obtained, and which uncovered a (happily harmless to the paper) bug in the code yesterday, occurs with max signature degree still 22 but max total degree also 22.

Now that the code runs so quickly, it might make sense to drop all of the nonsense I had to do with two different degrees (for max signature degree and max total degree), using even signatures, and so forth.

Instead, we could simply list the results for some total degree, and just using unrestricted signatures (besides not allowing 1’s).

That would certainly be a nice simplification of the exposition, but (a) I think it is actually so interesting that using only even numbers works very nearly as well as using all signatures that it would be a shame not to mention it in the paper, and (b) we would have to redo quite a few computations (and by “we” I mean you, because my machine can’t feasibly go beyond d=19 or 20 with unrestricted signatures). Also, in practice you weren’t really using two different degrees – you were just using odd-numbered maximum total degree, which imposes a maximum signature degree one less just because even numbers can only sum to even numbers. So all in all I think I’d vote to leave things as they are.

It would be great if you or Pace would modify the relevant portion of the paper (i.e. the first two sections of h1-new.tex) to reflect all the new developments (and maybe also to provide links to supporting data and code), as I’ve kind of lost track of exactly what the state of art of the algorithms are, and what the parameters are for the results claimed in the paper.

Pace has done the edits to Section 7 (and the bounds on a couple of subtheorems in Section 3) and tells me he will upload them imminently. The only additional change that I think could be useful would be to edit the first item in the list of possible avenues for further progress, to reflect our extrapolations regarding the limit of M for large d. I suggest something like the following (replacing all but the first line of the current paragraph):

It is natural to speculate whether k could be lowered slightly, for instance to k=49, by further numerical computations. Our computations are limited by the growth of execution time with allowed degree of polynomials, but for the range of degrees within our ability to explore we have observed that for a given k and epsilon M always rises with d at a rate that declines very close to exponentially, allowing us to extrapolate a limit for M at large degree to within 0.001 or so with high confidence. This has not allowed us to decide whether M_49,epsilon can exceed 4, but we are confident that M_48,epsilon cannot, so the best that could be achieved in this way is H_1<=240. The best hope for deciding the k=49 case may be a refinement of the Krylov subspace approach described in Section 7.2. We additionally remark that this extrapolation approach has led us to conclude that no improvements are possible to the bounds stated in Theorems 3.9 (vii) and 3.13 (xiii) using these methods.

Aubrey: since the question whether is still open (see e.g. Ignace’s comment above – in which repeated Shanks transform extrapolations indicate that is barely above 4), I’m not sure about your conclusion that it is not possible to improve the bound in theorem 3.9(vii).

Thanks for the edits! I went through the paper and it looks close to submittable now once the final remarks on improvements are updated, and links to dropbox files are included. (Perhaps the thing to do is to create a subfolder in dropbox for all code and files, and link to that subfolder, rather than give two or three links to individual file names?)

I think we should be able to submit the paper in a few weeks, if everyone signs off on it. Since there was a weak preference for submitting to ANT, I’ll ask the editors there if they would like to handle a second large paper from us (it may well be that they would prefer not to do so to allow room for other authors too). I’ll get back to everyone on the response.

Ah, damn – thank you Eytan – in my text I inadvertantly switched the M_53,0 and M_49,epsilon cases. Here is what I should have written:

It is natural to speculate whether k could be lowered slightly, for instance to k=49, by further numerical computations. Our computations are limited by the growth of execution time with allowed degree of polynomials, but for the range of degrees accessible by these methods we have observed that for a given k and epsilon M always rises with d at a rate that declines very close to exponentially, allowing us to extrapolate a limit for M at large degree to within 0.001 or so with high confidence. This has led us to conclude that it is unlikely that k can be reduced even to 49, let alone further. This same approach also strongly implies that no improvements are possible to the bounds stated in Theorem 3.13 (xiii) using these methods. Thus, the only case of interest for which there remains substantial uncertainty is whether M_53,0 can exceed 4 with suffifiently high-degree polynomials, which would improve the bound of Theorem 3.9 (vii). However, the necessary degree is probably out of reach by these methods; the best hope for deciding the k=53 case may be a refinement of the Krylov subspace approach described in Section 7.2.

In the process of double-checking the bounds in Theorem 3.3 I noticed a significant typo. The bound for part (ix) H(41588) <= 474226 should be H(41588) <= 474266, as is listed in the records table for m=2 (without Deligne or EH), in the timeline (Jan 9 entry), and in Table 3 of Appendix A.

The same fix needs to be made in (7) on p.72 of Appendix A, which was copied form Theorem 3.3.

I have successfully verified all the (corrected) bounds in Theorem 3.3.

Assuming we are all satisfied with Theorems 3.2 and 3.3, I’m wondering if it now makes sense to remove the question marks from the corresponding k and H values in the timeline (I notice that there is no question mark on the most recently added H(k) bound for m=5).

It should be remarked that the recent breakthrough using the idea of constructing a sieve in several dimensions, is in some sense similar to the ideas of Roth and Baker (to construct auxiliary functions in several variables) in their breakthrough theorems.

I think that Eytan is referring to James Maynard’s (and independently Terry Tao’s) idea to use a multidimensional sieve. This is the basis of all improvements made in the Polymath 8b project. For more information on the multidimensional sieve I would refer you to James’ paper (accepted in Annals) and the first few posts by Terry in the polymath 8b project.

The common basic (simple – but very rewarding!) idea in all cases is to generalize previous work (using auxiliary function of one or two variables) by introducing a multivariate auxiliary function. In our case (polymath 8b) the univariate sieve of polymath8a (determined by the function ) is generalized to a multivariate sieve (determined by the function ).
Roth and Baker works (as clearly described in sections 7 and 2 of Baker’s book “Transcendental number theory”) use this basic idea:

Roth’s theorem (on rational approximation to algebraic numbers) is generalizing previous work (based on the construction of auxiliary polynomials in two variables – by Thue, Siegel and Dyson) – via the
construction of a multivariate auxiliary polynomial.

Baker’s theorem (on linear forms in logarithms) is generalizing previous work (based on the construction of an auxiliary function of a single complex variable – by Gelfond and Schneider) – via the construction of an auxiliary function of several complex variables.

I see. It’s an interesting observation, although I don’t know how to push it any further – it’s not clear, for instance, that any of the other ideas in transcendence theory transfer over to sieve theory (or vice versa). In any event, given that the sieve theory version of this idea comes from Maynard’s paper rather than the Polymath paper, our current paper might not be a suitable place to make this remark.

I’ve just had a final read-through of the draft paper, which I think is in a good form. Here is a short list of (minor) thoughts I had:

Page 2, comment under Theorem 1.3 regarding effective lower bounds for :
Maybe it would be better to leave this comment out, since one can obtain effective lower bounds without too much effort. (In the to-be-published version of my paper I have a remark on Page 12 indicating that similarly to Pintz’s suggestion that the bound can be made effective, and this is done more explicitly in my dense clusters paper. One can obtain explicit bounds using Cauchy-Schwarz and Proposition 4.2, the idea of which goes back to GPY and is also done explicitly in the dense clusters paper.)

Page 7, footnote:
In the version of the BFI paper I’ve taken online, I think this is labelled conjecture 1 instead of conjecture 0.

Page 15, remark under Theorem 3.13:
It is maybe better if we don’t have the comment about my joint preprint with Banks and Freiberg here, since I believe we are planning to revise that paper to be based on the stronger result which is only sketched at the end of the preprint, and this then wouldn’t be making use of the result of Theorem 3.13(i).

Page 25, first line:
I think we should have instead of here.

Page 25, lines 8 and 10:
I think the bounds should be something like and respectively, instead of with .

Page 39, lines 2 and 3:
Integration should be over not and the numerators for the differentials should be not .

Page 45, three lines after (6.5):
The version of my paper on the Annals website is slightly different from the arxiv version. Therefore, given that we are now citing the annals version, the reference to equation (8.17) from my paper should instead refer to (7.19).

Page 49, line 12:
I think we haven’t defined , maybe its better to write .

Page 51, line 7:
I think the numerator of the integrand should be instead of .

Page 55, line -8:
The reference to Lemma 7.3 in the older version of my paper should refer to Lemma 8.3 in the newer version.

Page 62, line 1:
I was slightly confused by the reference to the suggestion of the optimal choice of . I assume that we don’t believe that this choice is globally optimal for , but merely optimal given the restriction to our 60 piece decomposition?

Page 70, lines -12, -13:
I think there should be a singular series factor in the right hand sides here.

Yes, for fixed Krylov subspace size , the largest eigenvalue actually converges to as . Therefore, unfortunately, numerical results of this type do not give us information about the asymptotic behavior of .

Ignace: in your table (with fixed ), the largest eigenvalue seems to monotonically increase with (without convergence to ).
Obviously, increasing with is needed to keep the resulting lower bounds sufficiently close to .

Ignace: apologies for my limited understanding of the Krylov method, but is it now your belief that:

– we might well be able to decide the M_53 case by increasing r, but

– without using the Shanks transform, r=25 is currently the limit of your computational capacity, and

– the inaccuracy that that transform introduces is too great to let us trust the results, since they come out so close to M=4?

If so, I might be able to reimplement your Maple code in Mathematica and optimise it to reach sufficient r without the transform. Of course this is all somewhat peripheral at present since it is limited to epsilon=0, but it may have the side-benefit of providing more confidence concerning the behaviour of M for really large d, especially d exceeding k, which might not follow the exponential decay pattern we’ve seen at smaller d.

Hi Aubrey: that is a pretty good description of the situation right now! I believe that increasing should allow settling this question but I cannot estimate how large would have to be.

Judging by the scaling of the number of linearly independent symmetric polynomials as a function of the degree, I think current (parallel) hardware could realistically compute the Krylov subspace and the matrix entries for up to somewhere around 50. That leaves a lot of room for improvement upon the current .

Given the fact that you achieved such impressive speedups for Pace’s code (and the fact that I’m not a Maple expert), it seems quite likely that you could push higher than 25. Should you decide on doing this, I’d be willing to answer any questions that you might have about the Maple code.

Great. I don’t have (or know) Maple, but I had no trouble reimplementing Terry’s Maple script for validating k in the asymptotic setting in Mathematica, so I’d hope to find your code similarly straightforward. However, I don’t have your source code – there seems to be nothing in the dropbox, and your own site seems only to have compiled code. Could you please email the source code to me at aubrey@sens.org so I can have a look? (I’m very busy this week, so no promises, but I’d like to have a go at this when I can.)

(Almost) layman asking: if I understand all of this correctly, your current bound, without any conjectures, 246. Relying on unproves conjectures, you are able to reach a bound of 6. In particular, you have been able to show that your current methods are insufficient to obtain a better result than 6, and in order to prove the twin prime conjecture, it is more than likely that completely new insight is needed. Is this correct?

I’ve been tinkering a bit more with choices of monomials, in the hope that some kind of explanation might emerge for why omitting all terms whose signatures contain odd numbers reduces M by such a small amount. That hasn’t happened, but one thing I found seemed interesting enough to report here (not least since it may give others an idea for said explanation): among terms with evens-only signatures, those whose power of (1-P_1) is even can be eliminated at very little cost. A second trend is that the smaller a signature’s first (and thus largest) element is, the smaller the cost of omitting terms using it, the signature {} being the cheapest of all. Taking advantage of these, it transpires that with k=50, d=25 and epsilon optimised to 37/1000 we can reach M=4.000181 using only 1061 of the 1780 evens-only-signature terms. This is achieved by deleting terms whose power of (1-P_1) is even and whose signature is {} or has a first element of at most 10. Both trends are quite approximate, so my impression is that by exploiting the relative contribution of each term more exhaustively we could probably keep M_50,epsilon above 4 with considerably fewer than 1000 terms – though probably still too many to make it reasonable to perform the Maple validation that Terry did with k=3, unfortunately!

It seems that in definition, the argument of the expectation should include also the integral (since its integration domain is random – depending on ). Therefore, the new “)” should be moved to enclose also the integral.

[…] paper, the nearly finished Polymath8b paper, and the retrospective paper), superseding the previous Polymath8b thread (which was quite full) and the Polymath8a/retrospective thread (which was more or less […]

For commenters

To enter in LaTeX in comments, use $latex <Your LaTeX code>$ (without the < and > signs, of course; in fact, these signs should be avoided as they can cause formatting errors). See the about page for details and for other commenting policy.