Primality Testing in Polynomial Time (Ⅰ)

August 6, 2012

1. Introduction

Exactly ten years
ago, Agrawal,
Kayal,
and Saxena
published “PRIMES
is in P”, which described an algorithm that could provably
determine whether a given number was prime or composite in polynomial
time.

The AKS algorithm is quite short, but understanding how it works
via the proofs in the paper requires some mathematical sophistication.
Also, some results in the last decade have simplified both the
algorithm and its accompanying proofs. In this article I will explain
in detail the main result of the AKS paper, and in a follow-up article
I will strengthen the main result, use it to get a polynomial-time
primality testing algorithm, and implement that algorithm in
Javascript. If you've
understood my introduction to
primality testing, you should be able to follow along.

Let's get started! The basis for the AKS primality test is the
following generalization
of Fermat's
little theorem to polynomials:

(Fermat's little theorem for polynomials,
strong version.) If \(n \ge 2\) and \(a\) is relatively prime
to \(n\), then \(n\) is prime if and only if
\[
(X + a)^n \equiv X^n + a \pmod{n}\text{.}
\]

The form of the equation above may be unfamiliar. The polynomials
in question
are formal
polynomials. That is, we care only about the coefficients of
the polynomial and not how it behaves as a function. In this case, we
restrict ourselves to polynomials with integer coefficients. Then we
can meaningfully compare two polynomials modulo \(n\): we consider two
polynomials congruent modulo \(n\) if their respective coefficients
are all congruent modulo \(n\). (Equivalently, two polynomials
\(f(X)\) and \(g(X)\) are congruent modulo \(n\) if \(f(X) - g(X) = n
\cdot h(X)\) for some polynomial \(h(X)\).) This definition is
consistent with how they behave as functions; if two polynomials
\(f(X)\) and \(g(X)\) are congruent modulo \(n\), then treating them
as functions, \(f(x)\ \equiv g(x) \pmod{n}\) for any integer
\(x\).[1]

Unfortunately, this test by itself cannot give a polynomial-time
algorithm as testing even one value of \(a\) may require looking at
\(n\) coefficients of the left-hand side. (Remember that we're
interested in algorithms with time polynomial not in the input \(n\),
but in its bit length \(\lg n\). Such an algorithm is described as
having time polylog in \(n\).) However, we can reduce the
number of coefficients we have to look at by taking the powers of
\(X\) modulo some number \(r\). This is equivalent to taking the
modulo of the polynomials themselves by \(X^r - 1\); you can see this
for yourself by picking some polynomial and some value for \(r\) and
doing long division by \(X^r - 1\) to find the remainder. (It may
seem weird to talk about taking the modulo of one polynomial with
another, but it's entirely analogous to integers.) This gives us a
weaker version of the theorem above:

(Fermat's little theorem for polynomials,
weak version.) If \(n\) is prime and \(a\) is not a multiple
of \(n\), then for any \(r \ge 2\)
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]

The “double mod” notation above may be unfamiliar, but
in this case its meaning is simple. We consider two polynomials
congruent modulo \(X^r - 1, n\) when they are congruent modulo \(n\)
after you reduce the powers of \(X\) modulo \(r\) and combine like
terms. More generally, two polynomials \(f(X)\) and \(g(X)\) are
congruent modulo \(n(X), n\) if \(f(X) - g(X) \equiv n(X) \cdot h(X)
\pmod{n}\) for some polynomial \(h(X)\).

With this theorem, we only have to compare \(r\) coefficients, but
we introduce the possibility of the condition above being met even
when \(n\) is composite. But can we impose conditions on \(r\) and
\(a\) such that if the condition holds for a polynomial number of
pairs of \(r\) and \(a\), we can be sure that \(n\) is prime? The
answer is “yes”; in particular, we can find a single \(r\)
and an upper bound \(M\) polylog in \(n\) such that if the condition
holds for \(r\) and \(0 \le a \lt M\), then \(n\) is prime.

In the remainder of this article, we'll work backwards. That is,
we'll first assume we have some \(n \ge 2\), \(r \ge 2\), and \(M \ge
1\) such that for all \(0 \le a \lt M\)
\[
(X + a)^n \equiv X^n + a \pmod{X^r - 1, n}\text{.}
\]
Then we'll assume that \(n\) is not a power of one of its prime
divisors \(p\) and try to deduce the conditions that imposes on \(n\),
\(r\), \(M\), and \(p\). Then we can take the contrapositive to find
the inverse conditions on \(n\), \(r\), \(M\), and \(p\) that would
then force \(n\) to be a power of \(p\). Since we can easily test
whether \(n\) is
a perfect
power, if it's not one, we can immediately conclude that \(n =
p^1\) and thus prime. (Of course, if it does turn out to be a perfect
power, then it is trivially composite.)

To understand the conditions that we will derive, we must first
talk about introspective numbers.

2. Introspective numbers

Given a base \(b\), a polynomial \(g(X)\) and a number \(q\), we
call \(q\) introspective[2] for \(g(X)\) modulo \(b\) if
\[
g(X)^q = g(X^q) \pmod{b}\text{.}
\]

We also say that \(g(X)\) is introspective under \(q\)
modulo \(b\).

A basic property of introspective numbers and polynomials is that
they are closed under multiplication. That is, if \(q_1\) and \(q_2\)
are introspective for \(g(X)\) modulo \(b\), then \(q_1 \cdot q_2\) is
also introspective for \(g(X)\) modulo \(b\), and if \(g_1(X)\) and
\(g_2(X)\) are introspective under \(q\) modulo \(b\), then \(g_1(X)
\cdot g_2(X)\) is also introspective under \(q\) modulo \(b\).

In particular, given our assumptions above, we can easily see that
\(1\), \(p\), and \(n\) are introspective for \(X + a\) modulo \(p\)
for any \(0 \le a \lt M\). We can also show that \(n/p\) is also
introspective for \(X + a\) modulo \(p\). Using closure under
multiplication, we can talk about the set of numbers generated by
\(p\) and \(n/p\), which are all introspective for \(X + a\) modulo
\(p\). Call this set \(I\):

We can also take the closure of all \(X + a\) to get a set of
polynomials which are all introspective under \(p\), \(n/p\), or any
number in \(I\). Call this set \(P\):
\[
P = \left\{ 0 \right\} \cup
\left\{ X^{e_0} \cdot (X + 1)^{e_1} \cdot \ldots \cdot (X + M -
1)^{e_{M - 1}} \mid e_0, e_1, \ldots, e_{M - 1} \ge 0 \right\}\text{.}
\]
To summarize, \(I\) is a set of numbers and \(P\) is a set of
polynomials such that for any \(i \in I\) and \(g(X) \in P\), \(i\) is
introspective for \(g(X)\) modulo \(p\). Of course, it's still not
clear what these two sets have to do with whether \(n\) is prime or
not. But we will examine certain finite sets related to \(I\) and
\(P\) and their sizes, and we will see that we can deduce their
properties depending on the relation of \(n\) to \(p\).

3. Bounds on finite sets related to \(I\) and \(P\)

Now we're ready to work towards finding our restrictions on \(n\),
\(r\), \(M\), and \(p\). We'll slowly build them up such that when
the last one falls into place, we know that \(n\) is a perfect power
of \(p\). Here's what we're starting with:

We can calculate \(|P_d|\) exactly,[4] but
we only need a lower bound for when \(d \le M\). Consider \(P_d^{\{0,
1\}}\), the subset of \(P_d\) where each \(X + a\) is present at most
once. Since each \(X + a\) is either present or not present, but not
all of them can be present at the same time, there are \(2^d - 1\)
distinct polynomials in \(P_d^{\{0, 1\}}\). Adding back the zero
polynomial yields \(|P_d^{\{0, 1\}}| = 2^d\). Since \(P_d^{\{0,
1\}}\) is a subset of \(P_d\), \(|P_d| \ge |P_d^{\{0, 1\}}| = 2^d\).
Therefore, if \(d \le M\), then[5]
\[ |P_d| \ge 2^d\text{.} \]
This will be useful later (for a particular value of \(d\)), so let's
add the restriction to \(M\):

Let us restrict \(I\) in a different way, by reducing modulo \(r\):
\[
J = \left\{ x \bmod r \mid x \in I \right\}
\]
and let \(t = |J|\). (This size will play an important role
later.)

Our final set that we're interested in needs some background to
define. We want to find a subset of \(P\) that lies in some field
\(F\) because fields have some convenient properties that we will use
later.[6]

Consider \(\mathbb{Z}/p\mathbb{Z}\), the ring
of integers
modulo \(p\). Since \(p\) is prime, it is also a field. In
particular, it is
the finite
field \(\mathbb{F}_p\) of order \(p\). Then consider
\(\mathbb{F}_p[X]\),
its polynomial
ring, which is the set of polynomials with coefficients in
\(\mathbb{F}_p\). Given some polynomial \(q(X) \in \mathbb{F}_p[X]\),
we can further reduce modulo \(q(X)\) to get \(\mathbb{F}_p[X] /
q(X)\). Finally, if \(q(X)\) is
irreducible
over \(\mathbb{F}_p\), then \(\mathbb{F}_p[X] / q(X)\) is also a
field.

So we just need to find a polynomial that's irreducible over
\(\mathbb{F}_p\). We know that \(X^r - 1\) has \(\Phi_r(X)\), the
\(r\)th cyclotomic
polynomial, as a factor. \(\Phi_r(X)\) is irreducible over
\(\mathbb{Z}\), but not necessarily over \(\mathbb{F}_p\). But if
\(r\) is relatively prime to \(p\), then \(\Phi_r(X)\) factors into
irreducible polynomials all of degree \(o_r(p)\)
(the multiplicative
order of \(p\) modulo \(r\)) over \(\mathbb{F}_p\).[8] Then we can
just require that \(r\) be relatively prime to \(p\). If we do so,
then we can let \(h(X)\) be one of the factors of \(\Phi_r(X)\) over
\(\mathbb{F}_p\) and we have our field \(F = \mathbb{F}_p[X] /
h(X)\).

We can map elements of \(P\) into \(Q\) via reduction modulo
\((h(X), p)\). But we're interested in only the elements of \(P\)
that map to distinct elements of \(Q\), since that will let us find a
lower bound for \(|Q|\). A simple example would be the set of \(X +
a\) for \(0 \le a \lt M\); if the degree of \(h(X)\) is greater than
\(1\) and \(p \ge M\), then each \(X + a\) is distinct in \(Q\).

Another interesting set is \(X^k\) for \(1 \le k \le r\). Since
\(h(X) \equiv 0 \pmod{h(X}, p)\), we can say that \(X\) is a root of
the polynomial function \(h(y)\) over the field \(F\). But since
\(h(y)\) is a factor of \(\Phi_r(y)\), \(X\) is then a primitive
\(r\)th root of unity in \(Q\).[9] But the powers of a primitive \(r\)th
root of unity (from \(1\) to \(r\)) are all distinct. Therefore all
\(X^k\) for \(1 \le k \le r\) are distinct in \(Q\).

Most importantly, we can show that distinct elements in \(P_d\) map
to distinct elements in \(Q\) if \(d \le t\). Let \(f(X)\) and
\(g(X)\) be two different elements of \(P_d\). Assume that \(f(X)
\equiv g(X) \pmod{h(x}, p)\). Then, for \(m \in I\):
\[
f(X^m) \equiv f(X)^m \pmod{X^r - 1, p}
\]
and
\[
g(X^m) \equiv g(X)^m \pmod{X^r - 1, p}
\]
by introspection modulo \(p\), and therefore
\[
f(X^m) \equiv g(X^m) \pmod{X^r - 1, p}
\]
which immediately leads to
\[
f(X^m) \equiv g(X^m) \pmod{h(X}, p)\text{.}
\]
Therefore, all \(X^m\) for \(m \in I\) are roots of the polynomial
function \(u(y) = f(y) - g(y)\) over the field \(F\), and in
particular all \(X^m\) for \(m \in J\). But all \(t\) such \(X^m\)
are distinct in \(Q\) by the argument above. Therefore, \(u(y)\) must
have degree at least \(t\) since a polynomial over a field cannot have
more roots than its degree. But the degree of \(u(y)\) is less than
\(d\) since both \(f(y)\) and \(g(y)\) have degree less than \(d\).
Since \(d \le t\), this is a contradiction, so therefore \(f(X)
\neq g(X) \pmod{h(x}, p)\). But since \(f(X)\) and \(g(X)\)
were arbitrary, that implies that distinct elements of \(P_d\) map to
distinct elements of \(Q\) for \(d \le t\).

Given the above, we can conclude that as long as we require that
\(d \le t\), \(p \ge M\), and \(o_r(p) = \deg(h(X)) \gt 1\), then
\[
|Q| \ge |P_d| \ge 2^d\text{.}
\]

4. The AKS theorem (weak version)

We're finally ready to put it all together. Again assume \(n\) is
not a power of \(p\), and recall that \(|J| = t\). Let \(s \gt
\sqrt{t}\). Then \(|I_s| = s^2 \gt t\). By
the pigeonhole
principle, there must be two elements \(m_1, m_2 \in I_s\) that
map to the same element in \(J\); that is, there must be \(m_1, m_2
\in I_s\) such that \(m_1 \equiv m_2 \pmod{r}\). Now pick some
\(g(X)\) from \(P\). Then
\[
g(X)^{m_1} \equiv g(X^{m_1}) \pmod{X^r - 1, p}
\]
and
\[
g(X)^{m_2} \equiv g(X^{m_2}) \pmod{X^r - 1, p}
\]
by introspection modulo \(p\). But \(X^{m_1} \equiv X^{m_2} \pmod{X^r - 1}\) since \(m_1 \equiv m_2 \pmod{r}\), so
\[
g(X^{m_1}) \equiv g(X^{m_2}) \pmod{X^r - 1, p}\text{.}
\]
Chaining all these congruences together lets us deduce that
\[
g(X)^{m_1} \equiv g(X)^{m_2} \pmod{X^r - 1, p}\text{,}
\]
which immediately leads to
\[
g(X)^{m_1} \equiv g(X)^{m_2} \pmod{h(X}, p)\text{.}
\]

That means that \(g(X) \bmod (h(X), p) \in Q\) is a root of the
polynomial function \(u(y) = y^{m_1} - y^{m_2}\) over the field \(F\).
But \(g(X)\) was picked arbitrarily from \(P\), so \(u(y)\) has at
least \(|Q|\) roots. \(\deg(u(y)) = \max(m_1, m_2) \le p^{s-1} \cdot
(n/p)^{s-1} = n^{s-1}\), and \(u(y)\), being a polynomial over a
field, cannot have more roots than its degree, so if \(n\) is not a
power of \(p\), then \(|Q| \le n^{s-1}\). Equivalently, if \(|Q| \gt
n^{s-1}\), then \(n\) must be a power of \(p\).[10] But
we've shown above that \(|Q| \ge 2^d\) for \(d \le t\), so if we can
pick \(d\) and \(s\) such that \(2^d \gt n^{s-1}\), then we can force
\(n\) to be a power of \(p\). Taking logs, we see that this is
equivalent to picking \(d\) and \(s\) such that \(d \gt (s - 1) \lg
n\). Since \(d \le t\), this imposes \(t \gt (s - 1) \lg n\) in order
for there to be room to pick \(d\). Rearranging, we get \(s \lt
\frac{t}{\lg n} + 1\). But \(s \gt \sqrt{t}\), so this imposes
\(\sqrt{t} \lt \frac{t}{\lg n} + 1\) in order for there to be room to
pick \(s\). Rearranging again, we get \(\frac{t}{\sqrt{t} - 1} \gt
\lg n\). Since \(\frac{t}{\sqrt{t} - 1} \gt \sqrt{t}\), it suffices
to require that \(t \gt \lg^2 n\) in order for there to be room to
pick \(d\) and \(s\). Furthermore, since \(s\) has to be an integer,
then \(s \ge \lfloor \sqrt{t} \rfloor + 1\), and therefore \(d \gt
\lfloor \sqrt{t} \rfloor \lg n\). Let's update our assumptions:

So to summarize, if we make the above assumptions, we can pick
\(d\) and \(s\) such that \(|Q| \ge 2^d \gt n^{s - 1}\), which implies
that \(n\) must be a power of \(p\), which was our goal. Now we just
have to express all assumptions in terms of \(n\), \(r\), and \(M\),
strengthening them if necessary. \(J\) is generated by \(p\) and
\(n/p\), so its order (i.e., \(t\)) is at least \(o_r(p)\), which is
in turn at least \(o_r(n)\), since \(p\) is a prime factor of \(n\)
(this brings along the assumption that \(r\) and \(n\) are relatively
prime). Therefore, we can replace the assumptions \(t \gt \lg^2 n\)
and \(o_r(p) \gt 1\) with \(o_r(n) \gt \lg^2 n\). We can remove the
reference to \(d\) by finding the maximum value of \(t\). Since \(r\)
is relatively prime to \(n\), \(J\) is a subgroup of \(Z_r\), and
therefore its order divides (and therefore is at most) \(\varphi(r)\).
So we can replace \(M \ge d \gt \lfloor \sqrt{t} \rfloor \lg n\) with
\(M \gt \lfloor \sqrt{\varphi(r)} \rfloor \lg n\). Finally, we can
remove the reference to \(p\) by mandating that \(n\) has no prime
factor less than \(M\). Here are our final assumptions:

And that's it for now! In the follow-up article we will strengthen
this theorem to further show that \(n\) is equal to \(p\), and
therefore prime. Then we will use this result to get a
primality-testing algorithm that we will prove to be polynomial
time.

[1] We use uppercase letters for variables when we treat
polynomials as formal polynomials and lowercase letters when we
treat them as functions. ↩

[2] The term “introspection”, which comes
from the original AKS paper, was probably chosen to invoke the idea
that the exponent \(q\) can be pushed into and pulled out of \(g(X)\).
Here we generalize it a bit. ↩

[3] This condition is too weak to be useful by itself,
but we will parlay it into something we can use later.
↩

[4] Using the ideas
on this page,
we can show that \(|P_d| = {M + d \choose d - 1} + 1\) by
considering each \(X + a\) a labeled urn (plus a
“dummy” urn) and each unit of power an unlabeled
ball. (This was used in the AKS paper.)
↩

[9] It's a bit weird to talk about a polynomial being
the root of other polynomials, but recall that we can form a
polynomial ring over any field, even a field of polynomials. We
keep track of which polynomials belong to which domains by using
different variables.
↩