Sieve Of Atkin

February 12, 2010

We examined the use of the Sieve of Eratosthenes for enumerating the prime numbers, a method that dates to the ancient Greeks, in two previousexercises. Just a few years ago, in 2004, A. O. L. Atkin and D. J. Bernstein developed a new method, called the Sieve of Atkin, that performs the same task more quickly. The new method first marks squarefree potential primes, then eliminates those potential primes that are not squarefree.

Atkin’s sieve begins with a boolean array (bits are fine) of length n equal to the number of items to be sieved; each element of the array is initially false.

Squarefree potential primes are marked like this: For each element k of the array for which k ≡ 1 (mod 12) or k ≡ 5 (mod 12) and there exists some 4x² + y² = k for positive integers x and y, flip the sieve element (set true elements to false, and false elements to true). For each element k of the array for which k ≡ 7 (mod 12) and there exists some 3x² + y² = k for positive integers x and y, flip the sieve element. For each element k of the array for which k ≡ 11 (mod 12) and there exists some 3x² – y² = k for positive integers x and y with x > y, flip the sieve element.

Once this preprocessing is complete, the actual sieving is performed by running through the sieve starting at 7. For each true element of the array, mark all multiples of the square of the element as false (regardless of their current setting). The remaining true elements are prime.

Your task is to write a function to sieve primes using the Sieve of Atkin. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

This is such a naive implementation of the old pseudo code from the Wikipedia article (since updated with a version of pseudo code that is much more true to the Atkin algorithm). Using the simplification of modulo 12 rather than modulo 60 it adds about a constant factor of 23.5% to the work, but worse is that, due to using a range of x and y variables all the way to the square root of the range, the performance isn’t even O(n) as it spends increasing amounts of time with increasing range just calculating quadratic equations that it can’t use and are rejected by the various conditions including the time expensive modulo operations. This old pseudo code algorithm will never beat even an odds only Sieve of Eratosthenes due to this huge waste of computing power, let alone a maximally wheel factorialized true SoE. Even then, neither will be all that fast as with the huge sieve buffer algorithm the memory access time gets to be a bottleneck once the buffer sizes exceed the CPU cache sizes (although one won’t notice this with interpreted languages such as Python which have loop times much greater than the memory access times).

Even the new pseudo code at Wikipedia still wastes almost half of it’s operations due to redundant calculations that are rejected by the modulo conditions, but at least this is a constant factor so we get back our O(n) performance; with the large array algorithm, it is quite easy to correct this by using an intermediate loop that stops almost all the redundant modulo rejected computation – when this is done, the maximally factorized SoE is still slightly faster than the SoA to sieving ranges up to just over a billion and then the SoA gradually pulls ahead by up to 20% at about 60 million, at which point most operating systems and languages refuse to give us more memory (the 2 Gigabyte limit even for 64 bit systems).

I’ve written versions for the large memory array (bit packed) SoA versus the maximally wheel factorized (also bit packed) SoE and have found that both run at about the same time at a range of one billion at just a few seconds each (about 30 CPU clack cycles per operation).. A page segmented version of the SoE runs at not much more than a half second (less than 5 CPU clock cycles per operation), whereas Bernstein’s page segmented, highly tuned ‘C’ SoA runs at about 0.8 seconds, both on the same machine (SoA has more average CPU cycles per operation due to the difficulty in implementing a page segmented algorithm, and this difficulty increases more quickly with range than for the SoE). So no, the Sieve of Atkin isn’t “screamingly fast” and in fact gets rather slow at the large sieve ranges where it is supposed to excel.

In fact, Bernstein “held back” his reference implementation of the Sieve of Eratosthenes against which he compared, as there are signs in his code that he at least thought of applying a pre-culled-of-small-primes wheel pattern to the page segment buffers, which would have made it faster than the SoA for all competing ranges. He limited the SoE to the same wheel factorization as that of the SoA; the factorization of the SoE can be increased much above that whereas the SoA can’t as the wheel factorization is set by the algorithm.