Sieve of Eratosthenes

February 19, 2009

Over two millenia ago, Eratosthenes, who calculated the circumference of the earth, the distance to the Sun and the tilt of the Earth’s axis, developed a system of latitude and longitude, and invented the leap day, created a systematic method to enumerate the prime numbers that is still in use today. Eratosthenes was born in Cyrene (present-day Libya), lived from 276 B.C. to 194 B.C., and spent most of his life in Alexandria, Egypt, where he was the second Chief Librarian of the Great Library, succeeding Apollonius of Rhodes; he was a good friend of Archimedes.

The Sieve of Eratosthenes starts by making a list of all the numbers up to a desired maximum; we’ll illustrate the method by calculating the prime numbers through thirty:

(Although it may not be obvious, the number 4 is crossed off the list; in some fonts, the cross-bar of the 4 coincides with the strike-through bar.) Next, take the next number on the list that isn’t crossed off, 3, and cross off every third number; some of them have already been crossed off:

And so on, each time crossing off all multiples of the next un-crossed number on the list. The list of prime numbers are all those that haven’t been crossed off:

2 3 5 7 11 13 17 19 23 29

This method is called a sieve because it sweeps through a range of numbers, with each prime number, as it is discovered, blocking all its multiples from falling through as prime numbers. The sieve admits several optimizations. First, only odd numbers are considered, since the initial sifting crosses off all the even numbers except 2, which is handled separately. Second, crossing off starts at the square of the number being sifted, since all smaller primes have already been crossed off by previous steps of the sieve; for instance, sifting by 3 starts at 9, since 6 was already crossed off when sifting by 2. Third, sifting stops at the square root of the maximum number in the sieve, since any non-primes larger than the square root must have already been crossed off at previous levels of the sieve; thus, in the above example there is no need to sieve on the prime number 7, or any larger prime number, since the square of 7 is greater than 30, which is the largest number in the list.

Write a function that takes a single argument n and returns a list of prime numbers less than or equal to n using the optimized sieving algorithm described above. Apply the function to the argument 15485863 and count the number of primes returned.

Suppose you have already sieved 2, 3, and 5 and are now beginning to sieve 7. Start by adding 7 to the list of primes. Then 14, but that has already been sifted out by 2; likewise, 21 has been sifted out by 3, 28 has been sifted out by 2, 35 has been sifted out by 5, and 42 has been sifted out by 2 (and also 3, but that doesn’t matter). The first multiple of 7 that gets sifted out is 49, which is 7 times 7. In general, when sifting by n, sifting starts at n-squared because all the previous multiples of n have already been sieved.

The rest of the expression has to do with the cross-reference between numbers and sieve indexes. There’s a 2 in the expression because we eliminated all the even numbers before we ever started. There’s a 3 in the expression because Scheme vectors are zero-based, and the numbers 0, 1 and 2 aren’t part of the sieve. I think the 6 is actually a combination of the 2 and the 3, but it’s been a while since I looked at the code, so I’ll leave it to you to figure out.

When you want to begin sieving 7, the first number you sift out is 7 × 7 = 49. The number 7 is at index i = 2 in the vector, and the number 49 is at index startj = (+ (* 2 i i) (* 6 i) 3) = 23 in the vector.

The straightforward implementation in C, storing all the odd numbers in an array and then zeroing all non-primes. Not being much of a programmer, I’ll admit I’m amazed my laptop can compute 1,000,000 primes in under half a second… And I suppose the code could even be made to run twice as fast on my c2duo by having two threads sifting through the array.

…turns out the multi-threaded version does not work, at least the way I’ve implemented it. I’ve effectively split the seek_primes in two threads, one sifting for (i = 0; i < n; i+=2), and the other one for (i = 1; i < n; i+=2).

It appears my first thread finds non-zero elements in the numbers[] array in seek_primes, but there is no guarantee these same elements are not going to be zeroed by the second thread later.

In fact up to 100 I find 3 primes too many, and up to 1000 about 38 (and of course the actual number found might even be different from one run to the next).

My python solution is at http://pastebin.com/Qup0TuFu.
I generated all primes from the integer list of 0-100000. Since the last prime in that list is 99991, using the square of the primes method, it is relatively easy and fast deriving the prime sequence of most numbers. I think I began having problems around the number of primes for the integer 9990000000.

This was achieved by replacing filter with the following in-place filter! routine (and staying in Chez Scheme, where set-cdr! is available).

A more general filter!, implemented as syntax! so it could consistently alter the list in place but not depend on returning a list would be interesting, but isn’t needed for this — this implementation returns the result of modifying the list in place, but the original value passed in is only modified from the first value not matching pred? on…

I got curious as to how big a difference the additon-vs-module distinction is on modern hardware (historically, it has been very large, as you point out).

I used the following code to run 10^8 integer additions, modulo operations, and modulo operations with a power-of-two modulus (another traditional distinction in performance).

I may well be missing something big, but the output shows about a 25% difference in speed between additions and modulo operations. This obviously adds up, but looking at the above code, it seems to be small compared to other differences (particularly, the various operations which control how many of whichever operation are carried out, and which reduce the infrastructural cost of the algorithm).

(In the code below, I use three vectors in an attempt to minimize the effect of cache read-ahead on subsequent vector-maps during earlier testing with a smaller n.)

This was my first Factor program I wrote a few months ago, to learn the language. Updated it with the optimization to start sieving at the square of the prime being sieved. (Didn’t know about that trick before :)

Python was a bit slow for me, and I’m trying to learn Common Lisp, so here’s
another version. I was only able to figure out how to get 2/3 of the optimizations
working, but it’s still absurdly fast compared to my last submission.

function primes=sieve(maxnum)
%Sieve of Erasthenes for finding primes less than or equal to the input
%argument
mask=ones(ceil((maxnum-1)/2),1);
for i=3:2:floor(sqrt(maxnum))
if mask((i-1)/2)
mask((i^2-1)/2:i:end)=0;
end
end
primes=[2;2*find(mask)+1];

It’s a tiny bit faster than the nearly-identical Matlab implementation, called primes.

I tried another solution in C++ using 2 threads, one running upwards on the numbers, the other running downwards. They stop if the reach another.
The solution is correct, but the performance decreases severly.

But I remember in java, an array of boolean is really cheap, only 1 bit per element. and if I switch so False is Boolean while True is not or not examined yet, this will probably safe some space and array creation time.

My solution in C using all optimizations. It prints primes as it finds them, and then once it hits the square root mark prints any remaining untouched variables in the array. Chews through N=15485863 in about 3.3 seconds.

function sieve(n,s)
for i=2,math.ceil(math.sqrt(n)) do
if (i==2) or (i%2==1) then for j=i*i,n,(i%2+1)*i do s[j]=false end end
end
return s
end
s=sieve(15485863,{false,false})
for i=1,15485863 do if s[i]==nil then print(i) end end

Just stumbled across this page. I have written a multi-threaded nth-prime algorithm in C# (it finds the millionth prime in 7 milliseconds using 16 threads on a 4 core – 8 hyper-threaded core – i7 3770s running at 3.1 GHz while using only 405 KB for the sieve) and was looking for information on how to convert it to racket. I’m new to racket and still trying to wrap my head around the functional approach to the sieve.

The various examples in functional languages should give me what I was looking for. Thanks to everyone for sharing.

If anyone is interested in the multi-threaded algorithm, let me know (I will be notified of follow up comments). Alternatively, as it appears that this page is all about coding challenges, how about a multi-threaded sieve challenge? I also want to convert my algo to c and compile it with clang to see what kind of boost clang’s optimers might provide vs. compiling with gcc (and vs. my visual studio compiled c#).

Note that my algorithm is an nth-prime algorithm – the inverse of what was requested on this page. In other words, given the input 1000000, my algo outputs 15485863.

There are three reasons I want to convert it to racket:

1. To use racket’s native big integers (the C# version begins to return clipped answers somewhere in the neighborhood of the 240 millionth prime).
2. To see what kind of performance penalty racket’s native big integers incurr (a simple tail recursive factorial in wicked fast in racket – faster than any big integer factorial I can find in c#).
3. It’s fun to learn new languages.

This is my PHP version that calculates half a million before the terminal crashes, don’t have a local copy of PHP to test with at the minute so using a remote low powered box, had to chunk the number sequences as a result of the low performing box.