Counting Primes Using Legendre’s Formula

July 22, 2011

The prime-counting function π(n) computes the number of primes not greater than n, and has been a matter of fascination to mathematicians for centuries. The Prime Number Theorem tells us that π(n) is approximately n / log(n); Carl Fredrich Gauss proposed the prime number theorem in 1792 when he was only fifteen years old, and Jacques Hadamard and Charles-Jean de la Valle Poussin both proved it, independently, in 1896. The first person to make a serious attempt at the calculation, except for trivial attempts that simply enumerated the primes and counted them, was the French mathematician Adrien-Marie Legendre at the end of the eighteenth century; his formula was correct, but he erroneously reported that π(106) = 78526, whereas the correct value is 78498. We will recreate Legendre’s calculation in today’s exercise.

Legendre’s formula works in two parts. First, he defined a function φ(x, a) that counts all the numbers from 1 to x inclusive that are not stricken by sieving with the first a primes. For instance φ(50, 3) = 14, because the numbers 1, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47 and 49 less than or equal to 50 are not divisible by the first three primes 2, 3, or 5. The φ function can be computed by the recurrence equation φ(x, a) = φ(x, a−1) − φ(x/pa, a−1), where φ(x, 1) is the number of odd integers less than or equal to x and pa is the ath prime number. The second part of Legendre’s formula uses the φ function to compute the value of π recursively: π(x) = φ(x, a) + a − 1, where a = π(⌊√x⌋).

Your task is to write φ and π functions and recreate Legendre’s calculation of π(1000000). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Like this:

LikeLoading...

Related

16 Responses to “Counting Primes Using Legendre’s Formula”

Even with memoization, my Python solution didn’t finish quickly enough for me. Interestingly, (or not, as the case may be)
the following naive Haskell implementation finishes much more quickly (especially when compiled via GHC).

The reference solution finds pi(10^6) in about 0.3 seconds and pi(10^7) in about 1.6 (under Petite Chez Scheme). Running the naive Haskell above with GHC’s runhaskell command (so, intepreted) takes almost 9.6 seconds for pi(10^6), while compiled with gch --make -O2 take about 0.5 seconds. My Haskell interpreted chokes when going for pi(10^7), but manages to finish in about 11 seconds when compiled. Thanks for the O’Neill paper; I think I’d come across it before, but hadn’t checked it out in depth.

I suspect the python shell uses a few stack frames, but probably not enough to push pi(1e7) over the limit.

I believe the culprit is the lru cache decorator for the phi() function. Looking at the source code in functools.py, when there is a cache miss, the wrapped function calls the original code for phi(). So I think that on a cache miss, two frames get pushed on the stack (one for the call to the wrapped function and one for the call to the original function). The original function may the recurse and call the wrapped function, etc.

I have not tested my hypothesis. However, if I pre-populate the cache by calling pi() in a loop with gradually increasing arguments, I can calculate pi() for larger arguments without hitting the stack limit.

I optimized the code a bit.
–If x < prime[a], the second term in the recurrence equation is zero, and the first term in the recurrence equation is basically doing a linear search through the primes to find the largest prime that divides x. The bisect() call replaces the linear search with a binary search to find the largest prime less than x.
–As stated in the problem, the recurrence terminates when a == 1. The recurrence can also be terminated when x == 1, where phi(1,a) is +1 or -1 depending on the sign the phi term would have (the variable s in the code).

Clojure also runs out of stack space, and just using a stack and a loop is not obvious how to memoize the intermediate results. So I took advantage of Mike’s optimization above (I admit I never would have figured out what the recurrence actually does on my own.) Uses O’Neil sieve algorithm to create enough primes to calculate pi to 1e9.

There is probably only a marginal effect if you run the scheme code with the cache empty or with previously cached results. The reason why it is so fast is that there are so many redundant calculations that don’t need to be performed when the results are cached. Without a cache, calculating phi for 1e9 will take (I don’t know, days?) Calculating phi for pi(1e6) naively without caching takes hours (and I’m not sure how long because I gave up after it was chugging for a few hours.)

Mike’s optimization removes some computational dead ends, and he also figured out what the recurrence actually does, so replacing a big chunk of the recurrence with the call to binary search the prime array saves a lot of time, but probably still leaves a lot of redundant computation in place. It’s time is still much improved over trying to compute phi naively without a cache.

Clojure is based on the JVM, and Java was not designed to be a functional programming language, so there’s not a whole lot of stack. I’m not sure why Python has so much problems. I worked on the Python code in the early 90’s and at that time stack frames were allocated on the heap. Because Python now has generators, there is even less reason to use the system stack for procedure calls, so it doesn’t make sense that there would be a stack limit…

Timing on my machine for 10**9 is 6.2s for Legendre, 0.12s for Lehmer. I tried Lehmer counts 10**12 in 24.9s, 10**13 in 186.2s. Pretty fantastic for Perl. I am using a Perl/XS library to get the prime list and do the small prime_counts in the Lehmer loop. In theory those could all be precalculated (MPU’s prime count is doing bit counts on the pre-sieved area to get the result).

primesieve is helped a lot by using 12 threads but it’s seriously fast even with only one thread (4-10x slower than multi-threaded). Both it and Math::Prime::Util are sieving the entire range (in segments) and doing fast bit counts, and both use very low memory since they run segmented. “Python-Legendre” is Mike’s code on my machine. The Lehmer results are great (IMO) for a first pass. Lots of optimization could be done on the “small” prime_count calls rather than its current method of making a big sieve and then counting all the bits up to the target, for each call.

Memoizing the phi function is the way to go. Ruby (which is slower language in execution than Clojure, generally) can compute pi(1e9) in 70 seconds, vs. 235 in Clojure. This is because there are no stack space issues here…

David, I don’t have your LazySieve, but just threw in a standard sieve to the nth_upper_limit of rt_n (and verified the sieving is a trivial amount of time and the prime counts are identical). I get “stack level too deep” errors in phi when trying to run 1e8 using Ruby 1.8.7. I had to set ‘ulimit -s 32768’ to get it to run 1e9 successfully (hmm, looks like ruby for Windows presets the stack to a giant size so that’s why it runs fine for you). Sadly I’m getting times that are *much* slower than yours — perhaps a 1.8.7 vs. 1.9.x issue. It’s 3.1s for 1e7, 58s for 1e8, and over 10 minutes for 1e9.

I believe ‘Memoizing the phi function is the way to go’ only applies to very small n values. My Legendre version in Perl above is over 10x faster than your Ruby numbers, and Perl isn’t _that_ much faster (I have a fast computer, but my memoized versions in Perl are slower than your Ruby numbers). When I did memoization it was not only slow for large n, but used gawdawful amounts of memory. The ruby code is using over 700MB of memory computing pi(10**9).