Finding primes with Python

One of the Euler problems requires finding all the primes below a certain number x. Without thinking about this very much, it’s possible to take a very easy approach. Let’s say, look at all the numbers below x and check whether each of them is prime. In pseudocode, this would go as follows:for each y less than x
if y is prime, add it to the list
return the list

The only requirement for this to work is a test to determine whether y is prime. Well, that’s easy enough. This would do the trick:

Now the only problem with this approach is that, as the numbers get larger, the prime test takes longer. For n = 10,007 (which is prime) the test requires 100 division operations. Even removing even numbers from the list we’re checking, this mounts up to hundreds of thousands of operations getting through the full list. If we wanted to test up to, say, 1,000,000, the number of operations required at that point is up in the hundreds of millions. Clearly, this will get prohibitively expensive after this point – so it’s time to start thinking about a smarter way to do this.

One way which does not require nearly as many operations is to use a prime sieve. This was an idea that occurred to Eratosthenes a couple of thousand years ago – it still impresses me to consider its effectiveness and its simplicity. Here’s a quick rundown of the algorithm from its Wikipedia entry:

Create a contiguous list of numbers from two to some highest number n.

Strike out from the list all multiples of two (4, 6, 8 etc.).

The list’s next number that has not been struck out is a prime number.

Strike out from the list all multiples of the number you identified in the previous step.

Repeat steps 3 and 4 until you reach a number that is greater than the square root of n (the highest number in the list).

All the remaining numbers in the list are prime.

There is a trade-off here: this approach is far faster than testing all numbers (and the bigger the number, the more noticeable this gain is), but it requires the space in memory to hold the array. Quite often this is the expense of improvement: reduce one cost and another increases.
This is how the algorithm looks in Python: