A Foray into Number Theory with Haskell

July 6, 2007

Find a perfect square \(s\) such that \(1597s + 1\) is also
perfect square.

After reading the discussion about implementing a brute-force
algorithm to solve the problem and spending a futile half-hour or so
trying my hand at find a better way, someone noticed that the problem
was an instance
of Pell's
equation which is known to have an elegant and fast solution;
indeed, he posted
a one-liner
in Mathematica solving the given problem. However, I wanted to try
coding up the solution myself as the Mathematica solution, while
succinct, isn't very enlightening since the heavy lifting is already
done by a built-in function and an arbitrary constant was used for this
particular instance of Pell's equation.

Pell's equation is simply the
Diophantine
equation \(x^2 - dy^2 = 1\) for a given
\(d\)[1]; being Diophantine means
that all variables involved take on only integer values. (In our
original problem, \(d\) is 1597 and we are asked for \(y^2\).) The
solution involves finding the continued fraction expansion of
\(\sqrt{d}\), finding the first convergent of the expansion
that satisfies Pell's equation, and then generating all other
solutions from that
fundamental solution. We rule out the trivial solution \(x =
1\), \(y = 0\) which also implies that if \(d\) is a perfect square then
there is no solution.

A continued fraction is an expression of the form: \[ x = a_0 +
\frac{1}{a_1 + \frac{1}{a_2 + \frac{1}{a_3 +
\frac{1}{\ddots\,}}}}\] where all \(a_i\) are integers and all but the
first one are positive. The standard math notation for continued
fractions is quite unwieldy so from now on we'll use \(\left \langle
a_0; a_1, a_2, \ldots \right \rangle\) instead of the above.

The theory of continued fractions is a rich and beautiful one but
for now we'll just state a few facts:

The continued fraction expansion of a number is (mostly) unique.

The continued fraction expansion of a rational number is
finite.

The continued fraction expansion of a irrational number is
infinite.

A quadratic
surd is a number of the form \(\frac{a + \sqrt{b}}{c}\)
where
\(a\), \(b\), and \(c\) are integers. Except
maybe for the first term, the continued fraction expansion of a
quadratic surd is periodic; that is, it repeats forever after a
certain number of terms. This applies in particular to the square root
of an integer.

Truncating an infinite continued fraction to get a finite
continued fraction gives (in some sense) an optimal rational
approximation to the irrational number represented by the infinite
continued fraction.

Given a quadratic surd it is fairly easy to manipulate it into the
form \(a + \frac{1}{q}\) where \(q\) is another quadratic surd. This fact
can be used to come up with an algorithm to find the continued
fraction expansion of a square
root. Wikipedia explains
it pretty well so I won't go over it, but here is my Haskell
implementation:

(Note that we're assuming that we won't be called with a perfect
square. Also, do you notice anything interesting about the periodic
portion of the continued fractions, particularly of \(\sqrt{103}\)?)

For those who are unfamiliar with Haskell, here's a quick list of key facts:

The first line takes a list of triplets and forms a list of all
third elements, which is what we're interested in. (The other two
elements of the triplet are auxiliary variables used by the
algorithm.)

iterate is a function which takes in another
function f, an initial variable x, and
returns the infinite list [ x, f(x), f(f(x)), f(f(f(x))),
... ].

Note that Haskell
uses lazy
evaluation and so this function does not take an infinite amount
of time to run; all its elements are evaluated (and memoized) only
when needed.

The rest of the function is a straightforward representation of
the meat of the algorithm described in the above Wikipedia entry.

It may not be clear what \(\sqrt{d}\) and its continued fraction
expansion has to do with solving Pell's equation. However, notice that
if \(x\) and \(y\) solve Pell's equation then manipulating Pell's equation
to get \(\sqrt{d}\) on one side reveals that \(\frac{x}{y}\) is a good
approximation of \(\sqrt{n}\). In fact, it is so good that you can prove
that \(\frac{x}{y}\) must come from truncating the continued
fraction expansion of \(\sqrt{d}\).

This leads us to the following: if you have an infinite continued
fraction \(\left \langle a_0; a_1, a_2, \ldots \right \rangle\) you can
truncate it into a finite continued fraction \(\left \langle a_0; a_1,
a_2, \ldots, a_i \right \rangle\) and simplify it into the rational
number \(\frac{p_i}{q_i}\). The sequence \(\frac{p_0}{q_0},
\frac{p_1}{q_1}, \frac{p_2}{q_2}, \ldots\) forms the
convergents
of \(\left \langle a_0; a_1, a_2, \ldots \right \rangle\) and converges to
its represented irrational number.

It turns out you can calculate \(p_{i+1}\) and \(q_{i+1}\)
efficiently from \(p_i\), \(q_i\), \(p_{i-1}\), \(q_{i-1}\), and \(a_{i+1}\)
using
the fundamental
recurrence formulas (which can be proved by induction). Here
is my Haskell implementation:

Here are a few more quick facts to help those unfamiliar with
Haskell:

The expression a : as forms a new list from the
element a and the existing list as
(equivalent to cons in Lisp).

zipWith3 is a function that takes in a
function f, three lists a, b,
and c of the same (possibly infinite)
length n, and forms the new list
[ f(a[0], b[0], c[0]), f(a[1], b[1], c[1]), ..., f(a[n], b[n],
c[n]) ].

Note that the result of zipWith3 is part of the
variable pqs which itself appears (twice!) in the
arguments to zipWith3. This is a Haskell idiom and
reflects the fact that the recurrence formulas define a convergent
in terms of its two previous convergents. A simpler example (using
the Fibonacci sequence) can be found in the
Wikipedia
entry for lazy evaluation.

Haskell has built-in data types for integers of arbitrary size
which is necessary as the numerators and denominators of the
convergents get large quickly. In fact, Haskell has built-in
data types for rational numbers (represented as fractions) but it
doesn't help us much here.

Since we are guaranteed that some convergent eventually satisfies
Pell's equation, we can write a simple function to generate all
convergents, test each one to see if it satisfies Pell's equation,
and return the first one we see. Here is the Haskell implementation:

Note the use of the
Haskell's list
comprehension syntax, similar to Python, which expresses what I
just described in a matter reminiscent of set notation.
Here is the full Haskell program designed so its output may be
conveniently piped
to bc
for verification:

The solution to the original problem is therefore:5054112910466227478111803017176109047976100000000.

Now that we've found a method to get a solution, the
question remains as to whether it's the only one. In fact it is not,
but it is the minimal one, and all other solutions (of which there are
an infinite number) can be generated from this fundamental one with a
simple recurrence relation as described on
the Wikipedia
article. My program above can be easily extended to generate all
solutions instead of just the fundamental one (I'll leave it to the
reader as an exercise).

One remaining question is the efficiency of this algorithm. For
simplicity, let's neglect the cost of the arbitrary-precision
arithmetic involved and assume that the incremental cost of generating
each term of the continued fraction expansion and the convergents is
constant. Then the main cost is just how many convergents we have to
generate before we find one that satisfies Pell's equation. In fact,
it turns out that this depends on the length of the period of the
continued fraction expansion of \(\sqrt{d}\), which has a rough upper
bound of \(O(\ln(d \sqrt{d}))\). Therefore, the cost of solving Pell's
equation (in terms of how many convergents to generate) for a given
\(n\)-digit number is \(O(n 2^{n/2})\). This is pretty expensive already,
although it's still much better than brute-force search (which is on
the order of exponentiating the above expression). Can we do better?
Well, sort of; it turns out the length of the answer is of the same
order as the expression above, so any algorithm that explicitly
outputs a solution necessarily takes that long. However, if you can
somehow factor \(d\) into \(s d'\), where \(s\) is a perfect square and \(d'\)
is squarefree
(i.e., not divisible by any perfect square), then you can solve Pell's
equation for the smaller number \(d'\) and output the solution for \(d'\)
as the smaller fundamental solution and an expression raised to a
certain power involving it. Note that in general this involves
factoring \(d\), another hard problem, but for which there exists tons
of prior work. An interested reader can peruse the papers
by Lenstra
and Vardi
for more details.

As a final note, one of the things I really like about number
theory is that investigating such a simple program can lead you down
surprising avenues of mathematics and computational theory. In fact,
I've had to omit a lot of things I had planned to say to avoid growing
this entry to be longer than it already is. Hopefully, this entry
helps someone else learn more about this interesting corner of number
theory.

[1] As a rule we'll avoid considering trivial cases and
re-stating obvious assumptions (like \(d\) having to be a positive
integer). ↩