Pritchard’s Wheel Sieve

January 6, 2012

We have seen several different sieves that enumerate the prime numbers not greater than n due to Eratosthenes, Atkin, Euler and Sundaram. In the 1980s, Paul Pritchard, an Australian mathematician, developed a family of sieve algorithms based on wheels, eventually finding an algorithm with O(n / log log n) time complexity and O(√n) space complexity. We examine a simple version of Pritchard’s wheel sieve in today’s exercise.

We begin with some definitions. Mk is the product of the first k primes; for instance, M7= 2×3×5×7×11×13×17=510510. The totatives of Mk are those numbers from 1 to Mk that are coprime to Mk (that is, they have no factors in common). It is easy to determine the totatives of Mk by sieving: make a list of the integers from 1 to Mk, then for each of the primes that form the product Mk, strike out from the list the prime and all of its multiples; for instance, with M3=2×3×5=30, sieving with 2 strikes out 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28 and 30, sieving with 3 strikes out 3, 6, 9, 12, 15, 18, 21, 24, 27, and 30, and sieving with 5 strikes out 5, 10, 15, 20, 25 and 30, leaving the totatives 1, 7, 11, 13, 17, 19, 23 and 29. A factoring wheel Wk contains the gaps between successive totatives, wrapping around at the end; for instance, W3 consists of the gaps 6, 4, 2, 4, 2, 4, 6 and 2, corresponding to the gaps 7−1, 11−7, and so on, ending with 29−23 and 31−29 when the wheel wraps around at the end.

Pritchard’s wheel sieve uses wheels repeatedly to strike out composite numbers from the sieve. It operates in two phases: a setup phase and a processing loop.

The setup phase first computes a parameter k such that Mk < n / loge n < Mk+1, then computes the wheel Wk and forms the set S from 1 to n by rolling the Wk wheel. The setup phase also computes the list of m primes not greater than the square root of n.

We compute the primes not greater than 100 as an example. We compute k=2, since 100/log(100) is 21.714724 which is between M2=6 and M3=30. The W2 wheel is {4 2} and the set S is {1 5 7 11 13 17 19 23 25 29 31 35 37 41 43 47 49 53 55 59 61 65 67 71 73 77 79 83 85 89 91 95 97}; although there is only one set S, we will refer to this set as S2, since it is the result of rolling the W2 wheel. Finally, the square root of 100 is 10, and the m=4 primes less than 10 are {2 3 5 7}.

The processing loop iterates from k+1 to m. At each loop we will strike some of the elements of S, reducing S from Sk to Sk+1. Each time through the loop we first identify p, the smallest member of S that is greater than 1, and strike it from the set S. We also strike from S the successive multiples p(s'−s) less than n, where s'−s are the successive gaps in S. Finally, we increment k by 1 and repeat the loop as long as k≤m.

Once k>m and the final S has been computed, the list of primes is returned, consisting of the primes less than the square root of n followed by the elements of S excluding 1. Thus the primes not greater than 100 are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 and 97.

This version of Pritchard’s sieve has time complexity and space complexity both equal to O(n), where the standard sieve of Eratosthenes has time complexity O(n log log n). The improvement comes from the fact that Pritchard’s sieve strikes each composite only once, whereas Eratosthenes’ sieve strikes each composite once for each of its distinct prime factors; for instance, Eratosthenes’ sieve strike 15 twice, one for the factor of 3 and once for the factor of 5. But despite the improvement in the asymptotic complexity, Eratosthenes’ sieve is fast because its inner loop consists only of addition, while Pritchard’s sieve is slower because its inner loop consists of a subtraction to compute the gap in the wheel, a multiplication to extend that gap by the current sieving prime, and an addition to add the gap to the previously-stricken element. Thus, in practice, Eratosthenes’ sieve is faster than Pritchard’s.

Your task is to write a program to compute the primes not greater than n using Pritchard’s wheel sieve. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

@Giorgio Valoti, the difference is that one of the wheel sets has been rolled one position from the other as usually the wheels are used starting from their next wheel position, as is W0 starts at 2, W1 starts at 3, W2 starts at 5, etc.- the wheel excluding 7 (W4 by my definition) starts at 11.

The algorithm is interesting, but really it is just a wheel factorized Sieve of Eratosthenes with less wheels used for smaller sieve ranges and larger ones used for bigger sieve ranges in order to keep the O(n) performance: one may as well use a constant maximum practical sieve size such as 2/3/5/7/11/13/17/19 (W8 by my definition) and get even better performance for small ranges, the same performance when this algorithm would call for this size of wheel anyway (at about 200 million to quite a few billion) and accept that performance above that point will be a little slower given that one doesn’t really want to deal with a huge gap table.

As to “worse performance than the Sieve of Eratosthenes”, that isn’t necessarily true, or at least not by much: I suspect that your implementation is slow because you persist in using lists to store the gaps/wheel primes and the set of candidates to be culled of composites: lists are very handy but the bane of performance (and consume memory since as well as the actual data per element, they also have to pointer link to the next element of 4 bytes/8 bytes for 32/64 bit systems, respectively). Of the three operations you mention, 1) you don’t have to calculate gaps per culling operations if you store the wheel as gaps (preferably in an array for faster access rather than a list), 2) integer multiplies aren’t expensive for a modern CPU and can be streamed to not take much more time than the other operations performed in parallel, and 3) integer additions are done in any case and take almost zero average time for a modern CPU.

The most practical sieves such a (C) primesieve fill the array with a precull using a pattern of the large wheel as above and then do further culling using only the W4 wheel of 48 “hit” positions out of the wheel range of 210; this seems to maximize efficiency and trade-offs between use of memory including optimizing the use of CPU caches vs. execution speed. In order to keep the inner loop simple and avoid having to jump by gaps, the ‘hit” modulos are often separated out so that the innermost loops can do simple addition offsets for their culling (jumping by even numbers of wheels), with separate innermost loops running for each modulo for each base prime. This can be done in any language, including Scheme.

for instance, see Python 2, in this case using only a 2/3/5 wheel and no preculling, but that would be fairly easy to add as would increasing the size of the wheel, here the wheel is given as constant literals but computing the wheel isn’t hard in any language: