Pollard’s P-1 Factorization Algorithm, Revisited

September 16, 2011

We have studied John Pollard’s p−1 algorithm for integer factorization on two previousoccasions, giving first the basic single-stage algorithm and later adding a second stage. In today’s exercise we look at a somewhat different version of the second stage, known as the improved standard continuation, that greatly improves the speed of the algorithm. The trick is to remove the modular exponentiations in the second stage, replacing them with modular multiplications, which are obviously much faster. Let’s begin with an example, computing the factorization of 16309 with B1=10 and B2=50:

For the first stage we will need the primes less than B1, which are 2, 3, 5 and 7. We will also need to know the integer logarithms of B1 with respect to each of those primes, which are 3, 2, 1, 1; that is, 23=8 is the largest power of 2 less than 10, 32=9 is the largest power of 3 less than 10, 51=5 is the largest power of 5 less than 10, and 71=7 is the largest power of 7 less than 10. Q is initially 2. For each prime we calculate q = qpa where p is the prime and a is the integer logarithm. Thus, q = 28 mod 16309 = 256 after prime 2, q = 2569 mod 16309 = 7011 after prime 3, q = 70115 mod 16309 = 4239 after prime 5, and q = 42397 mod 16309 = 9884 after prime 7. We calculated the greatest common divisor of q−1 and n at each step, and in each case the gcd was 1 (for instance, at the last step gcd(9883, 16309) = 1) and the first stage failed to split a factor.

For the second stage we will need the next prime larger than B1, which is 11, and the differences between the primes from there to B2, which are 2, 4, 2, 4, 6, 2, 6, 4, 2 and 4, representing the primes 13, 17, 19, 23, 29, 31, 37, 41, 43 and 47. We also need the ending q from the first stage and the modular exponentiation qd mod n for each of the differences 2, 4 and 6, which are 98842 mod 16309 = 2546, 98844 mod 16309 = 7443 and 98846 mod 16309 = 15129, respectively.

With this setup, the second stage is very fast. Q is initialized with the next prime, 11, so q = 988411 mod 16309 = 13427, and p = 1. Now we iterate over the differences. The first difference is 2, representing the prime 13, and the modular multiplication is q = 13427 × 2546 mod 16309 = 1478 with p = 1 × 147 mod 16309 = 1477. The second difference is 4, representing the prime 17, and q = 1478 × 7443 mod 16309 = 8488 with p = 1477 × 8487 mod 16309 = 9987. The third difference is 2, representing the prime 19, and q = 8488 × 2546 mod 16309 = 1023 with p = 9987 × 1022 mod 16309 = 13589. The fourth difference is 4, representing the prime 23, and q = 1023 × 7443 mod 16309 = 14195 with p = 13589 × 14194 mod 16309 = 12032. We’ve been taking the gcd(p, n) at each step, and they have all been 1, but here gcd(12032, 16309) = 47, which is a factor of 16309. The complete factorization is 16309 = 47 × 347. We can summarize the calculations like this:

You may recall from the earlier exercises that Pollard’s p−1 method is based on Fermat’s little theorem ap ≡ a (mod p), which can equivalently be stated ap−1 ≡ 1 (mod p). As a consequence, if p−1 divides M, then p divides gcd(2M−1, n). Pollard’s p−1 method computes M as the least common multiple of a bound B; thus, if all the factors of p−1 are less than B, then gcd(2lcm[1..B] − 1, n) is a factor of n.

In the two-stage version of Pollard’s p−1 method, also called the large-prime version, there are two bounds, B1 and B2. The first stage calculates the gcd as described above. The second stage then computes gcd(2k · lcm[1..B1] − 1, n) for each prime k such that B1 < k < B2. This is useful because most numbers factor as the product of several small factors and a large factor; numbers that factor as the product of two large primes are relatively rare.

We saw in the example how the least common multiple of the integers less than a bound B can be computed economically using powers of the prime numbers less than the bound. The large-prime variant can be computed economically by a trick. At the end of the first stage we have computed q = 2lcm[1..B1] mod n; that was 9884 in the example above. Then we store a table of the differences of the primes from B1 to B2, and for each difference d compute qd; since these differences are small and few in number, the table is small and quick to compute. Then we can compute each 2k · lcm[1..B1] by successively multiplying each q by the successive differences in the primes, thus converting a costly modular exponentiation to a much simpler modular multiplication. Finally, at each step we subtract 1 and compute the gcd, stopping when we find a factor.

In addition to the modular expontiations, the other large time cost of the algorithm is the computation of the greatest common divisor, and there is another trick that reduces that cost. Instead of taking the greatest common divisor after each step in either stage, take it only periodically, say after every hundred steps. If the gcd at that point is between 1 and n, then you have your factor and you are finished. If the gcd at that point is 1, then all of the intermediate gcds would also have been 1, but none of them needed to be calculated. Finally, if the gcd at that point is n, there are two factors since the last gcd calculation, so backtrack to the prior gcd and recompute the steps, taking the gcd at each step.

Your task is to write a function that finds factors using the p−1 algorithm, using both tricks described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.