Finding Primes

One of the most efficient ways to find smaller primes is to use a sieve. There are several different sieves but one of the easiest to code is the sieve of Eratosthenes.

Here is a simple example:

Python

1

2

3

4

low_primes=[2,3,5,7]

low_primes+=[xforxinrange(9,100,2)ifall(x%pforpinlow_primes)]

This starts with primes less than 10 and then finds primes between 10 and 100 by finding odd numbers that are not multiples of 2, 3, 5 or 7.

In order to test whether a number is prime, we only need to test whether it is a multiple of any prime up to the square root of the number since, if it has any integer factors, at least one of them must be at or below this limit. We could hence extend the above approach by using each new list to find primes in a larger interval – primes up to 10 allow the list to be extended to 100, primes up to 100 provide for extension up to 10,000 and so on.

But we really need a more general solution, which the following subroutine provides:

Python

1

2

3

4

5

6

7

8

9

defprime_list(n):

sieve=[False,False]+[True]*(n-1)

foriinrange(2,int(n**0.5)+1):

ifsieve[i]:

m=n//i-i

sieve[i*i:n+1:i]=[False]*(m+1)

return[iforiinrange(n+1)ifsieve[i]]

This is quite fast for an interpreted language such as Python but it uses a lot of memory when large primes are needed.An alternative that consumes less memory uses a Python generator:

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

defprime_gen(maxp=-1):

d={}

q=2

whilemaxp<0orq<maxp:

ifqnotind:

yieldq

d[q*q]=[q]

else:

forpind[q]:

d.setdefault(p+q,[]).append(p)

deld[q]

q+=1

We can time these two approaches using the following example which counts the number of primes below 1,000,000:

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

fromtimeimportclock

classTimer():

def__enter__(self):

self.start=clock()

def__exit__(self,*args):

print("Time",clock()-self.start)

withTimer():

x=prime_list(1000000)

print(len(x),x[-1])

withTimer():

x=[pforpinprime_gen(1000000)]

print(len(x),x[-1])

This gives the output:

Python

1

2

3

4

5

6

78498999983

Time0.637893911573

78498999983

Time7.35350640485

which shows that the prime list function is over ten times faster than the generator solution.

Nevertheless, the generator uses less memory and offers a lot of flexibility as can be seen in its use with a filter function such as:

Python

1

2

3

4

5

6

defselect(f,s):

forxins:

iff(list(map(int,str(x)))):

yieldx

Here a list of digits in the input number is created and the function f is applied to this list. We can now define a property we want for the digits of a prime using the function f (which can be a lambda function, an ordinary function written in a compact form).

For example, suppose we want to list the primes below 1000 with repeated digits. This does the job in a single line of code:

Python

1

2

3

4

5

6

7

8

print([pforpinselect(lambdax:len(x)!=len(set(x)),prime_gen(1000))])

[11,101,113,131,151,181,191,199,211,223,227,229,

233,277,311,313,331,337,353,373,383,433,443,449,

499,557,577,599,661,677,727,733,757,773,787,797,

811,877,881,883,887,911,919,929,977,991,997]

In a similar way we can easily find primes whose digits sum to 32:

Python

1

2

3

4

5

print([pforpinselect(lambdax:sum(x)==32,prime_gen(10000))])

[6899,8699,8969,9689,9887]

So far we have considered only primes where it is feasible to hold lists of primes in memory. But when numbers are very large, with hundreds or thousands of digits, it becomes impractical to find primes in this way. Although there are ways of determining whether a very large number is prime, in practice they are very slow. Fortunately, however, there are tests that can determine whether a given number is almost certain to be prime. Such tests are often known as probable prime tests but they would be more accurately called composite tests because they can determine with certainty that a number is composite (i.e. it is not a prime).

One such test is known as the Miller-Rabin test, which is as follows in Python:

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

fromrandomimportrandrange

defmiller_rabin(n,r=10):

ifn<10:

returnnin(2,3,5,7)

t=n-1

s=0

whilenott%2:

t>>=1

s+=1

foriinrange(r):

a=randrange(2,n-1)

x=pow(a,t,n)

ifx!=1andx!=n-1:

forjinrange(s-1):

x=(x*x)%n

ifx==1:

returnFalse

ifx==n-1:

break

else:

returnFalse

returnTrue

The first parameter is the number to test, the second parameter being the number of times the test is to be repeated (with a default of 10). Each time the test is run, the value returned will be False if the number is composite and True if the number is possibly prime.

For the Miller-Rabin test it is known that the probability of a test falsely indicating a prime is not higher than 1/4. Moreover, tests are independent of each other when run with different internal random numbers, which means that running N tests reduces the chance of finding a false prime to 1 in 4^N. So, if 10 tests all indicate that a value is prime, the probability that it is actually composite will be lower than 1 in 1,000,000. So, for example, if we want to find the largest primes below the powers of 10 between 10 and 30, we can use:

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

forninrange(10,31):

t=10**n-1

whilenotmiller_rabin(t):

t-=2

print(n,t)

109999999967

1199999999977

12999999999989

139999999999971

1499999999999973

15999999999999989

169999999999999937

1799999999999999997

18999999999999999989

199999999999999999961

2099999999999999999989

21999999999999999999899

229999999999999999999973

2399999999999999999999977

24999999999999999999999743

259999999999999999999999877

2699999999999999999999999859

27999999999999999999999999901

289999999999999999999999999791

2999999999999999999999999999973

30999999999999999999999999999989

In practice this test is almost always far better than its theoretical probability suggests and running it 20 times will make the probability of falsely reporting a composite as a prime far lower than the probability that a random machine memory error will cause such a false report.