Computing the Integer Square Root

December 9, 2014

1. The algorithm

Today I’m going to talk about a fast algorithm to compute
the integer
square root of a non-negative integer \(n\),
\(\mathrm{isqrt}(n) = \lfloor \sqrt{n} \rfloor\), or in words,
the greatest integer whose square is less than or equal to \(n\).[1] Most
sources that describe the algorithm take it for granted that it is
correct and fast. This is far from obvious! So I will prove both
correctness and speed below.

One simple fact is that \(\mathrm{isqrt}(n) \le n/2\), so a
straightforward algorithm is just to test every non-negative integer
up to \(n/2\). This takes \(O(n)\) arithmetic operations, which is bad
since it’s exponential in the size of the input. That
is, letting \(\mathrm{Bits}(n)\) be the number of bits required
to store \(n\) and letting \(\lg n\) be the base-\(2\) logarithm of
\(n\), \(\mathrm{Bits}(n) = O(\lg n)\), and thus this algorithm
takes \(O(2^{\mathrm{Bits}(n)})\) arithmetic operations.

We can do better by doing binary search; start with the range \([0,
n/2]\) and adjust it based on comparing the square of an integer in
the middle of the range to \(n\). This takes \(O(\lg n) =
O(\mathrm{Bits}(n))\) arithmetic operations.

Call this algorithm \(\mathrm{N{\small EWTON}\text{-}I{\small
SQRT}}\), since it’s based
on Newton’s
method. It’s not obvious, but this algorithm returns
\(\mathrm{isqrt}(n)\) using only \(O(\lg \lg n) =
O(\lg(\mathrm{Bits}(n)))\) arithmetic operations, as we will
prove below. But first, here’s an implementation of the
algorithm in Javascript:[3]

2. Correctness

The core of the algorithm is the iteration rule:
\[
x_{i+1} = \left\lfloor \frac{x_i + \lfloor \frac{n}{x_i}
\rfloor}{2} \right\rfloor
\]
where
the floor
functions are there only because we’re using integer
division. Define an integer-valued function \(f(x)\) for the right
side. Using basic properties of the floor function, you can show that
you can remove the inner floor:
\[
f(x) = \left\lfloor \frac{1}{2} (x + n/x) \right\rfloor
\]
which makes it a bit easier to analyze. Also, the properties of
\(f(x)\) are closely related to its equivalent real-valued function:
\[
g(x) = \frac{1}{2} (x + n/x)\text{.}
\]

For starters, again using basic properties of the floor function,
you can show that \(f(x) \le g(x)\), and for any integer \(m\), \(m
\le f(x)\) if and only if \(m \le g(x)\).

Intuitively, \(f(x)\) and \(g(x)\) “average out”
however far away their input \(x\) is from \(\sqrt{n}\). Conveniently,
this “average” is never an undereestimate:

(Lemma 1.) For
\(x \gt 0\), \(f(x) \ge s\).

Proof. By the basic properties of
\(f(x)\) and \(g(x)\) above, it suffices to show that \(g(x) \ge
s\). \(g'(x) = (1 - n/x^2)/2\) and \(g''(x) = n/x^3\). Therefore,
\(g(x)\) is concave-up for \(x \gt 0\); in particular, its single
positive extremum at \(x = \sqrt{n}\) is a minimum. But \(g(\sqrt{n})
= \sqrt{n} \ge s\). ∎

(You can also prove Lemma 1 without calculus; show that \(g(x) \ge
s\) if and only if \(x^2 - 2sx + n \ge 0\), which is true when \(s^2
\le n\), which is true by definition.)

(Note that any number greater than \(s\), say \(n\) or \(\lceil n/2
\rceil\), can be chosen for our initial guess without affecting
correctness. However, the expression above is necessary to guarantee
performance. Another possibility is \(2^{\lceil \lceil \lg n \rceil /
2 \rceil}\), which has the advantage that if \(n\) is an even power of
\(2\), then \(x_0\) is immediately set to \(\sqrt{n}\). However, this
is usually not worth the cost of checking that \(n\) is a power of
\(2\), as is required to compute \(\lceil \lg n \rceil\).)

An easy consequence of Lemmas 1 and 2 is that the invariant \(x_i
\ge s\) holds. That lets us prove partial correctness of
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\):

Proof. Assume it terminates. If it
terminates in step \(1\), then we are done. Otherwise, it can only
terminate in step \(3.2\) where it returns \(x_i\) such that \(x_{i+1}
= f(x_i) \ge x_i\). This implies that \(g(x_i) = (x_i + n/x_i) / 2 \ge
x_i\). Rearranging yields \(n \ge x_i^2\) and combining with our
invariant we get \(\sqrt{n} \ge x_i \ge s\). But \(s + 1 \gt
\sqrt{n}\), so that forces \(x_i\) to be \(s\), and thus
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) returns \(s\) if it
terminates. ∎

For total correctness we also need to show that
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) terminates. But this
is easy:

Proof. Assume it doesn’t
terminate. Then we have a strictly decreasing infinite sequence of
integers \(\{ x_0, x_1, \ldots \}\). But this sequence is bounded below
by \(s\), so it cannot decrease indefinitely. This is a contradiction,
so \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) must
terminate. ∎

We are done proving correctness, but you might wonder if the check
\(x_{i+1} \ge x_i\) in step \(3.2\) is necessary. That is, can it be
weakened to the check \(x_{i+1} = x_i\)? The answer is
“no”; to see that, let \(k = n - s^2\). Since \(n \lt
(s+1)^2\), \(k \lt 2s + 1\). On the other hand, consider the
inequality \(f(x_i) \gt x_i\). Since that would cause the algorithm to
terminate and return \(x_i\), that implies that \(x_i =
s\). Therefore, that inequality is equivalent to \(f(s) \gt s\), which
is equivalent to \(f(s) \ge s + 1\), which is equivalent to \(g(s) =
(s + n/s) / 2 \ge s + 1\). Rearranging yields \(n \ge s^2 +
2s\). Substituting in \(n = s^2 + k\), we get \(s^2 + k \ge s^2 +
2s\), which is equivalent to \(k \ge 2s\). But since \(k \lt 2s + 1\),
that forces \(k\) to equal \(2s\). That is the maximum value \(k\) can
be, so therefore \(n\) must be one less than a perfect square. Indeed,
for such numbers, weakening the check would cause the algorithm to
oscillate between \(s\) and \(s + 1\). For example, \(n = 99\) would
yield the sequence \(\{ 16, 11, 10, 9, 10, 9, \ldots \}\).

3. Run-time

We will show that \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\)
takes \(O(\lg \lg n)\) arithmetic operations. Since each loop
iteration does only a fixed number of arithmetic operations (with the
division of \(n\) by \(x\) being the most expensive), it suffices to
show that our algorithm performs \(O(\lg \lg n)\) loop iterations.

It is well known that Newton’s
method converges
quadratically sufficiently close to a simple root. We can’t
actually use this result directly, since it’s not clear that the
convergence properties of Newton’s method are preserved when
using integer operations, but we can do something similar.

Define \(\mathrm{Err}(x) = x^2/n - 1\) and let \(\epsilon_i =
\mathrm{Err}(x_i)\). Intuitively, \(\mathrm{Err}(x)\) is a
conveniently-scaled measure of the error of \(x\): it is less than
\(1\) for most of the values we care about and it bounded below for
integers greater than our target \(s\). Also, we will show that the
\(\epsilon_i\) shrink quadratically. These facts will then let us show
our bound for the iteration count.

First, let’s prove our lower bound for \(\epsilon_i\):

(Lemma 3.) \(x_i
\ge s + 1\) if and only if \(\epsilon_i \ge 1/n\).

Proof. \(n \lt (s + 1)^2\), so \(n + 1
\le (s + 1)^2\), and therefore \((s + 1)^2/n - 1 \ge 1/n\). But the
expression on the left side is just \(\mathrm{Err}(s +
1)\). \(x_i \ge s + 1\) if and only if \(\epsilon_i \ge
\mathrm{Err}(s + 1)\), so the result immediately
follows. ∎

Then we can use that to show that the \(\epsilon_i\) shrink
quadratically:

Proof. \(\epsilon_{i+1}\) is just
\(\mathrm{Err}(f(x_i)) \le \mathrm{Err}(g(x_i))\), so it
suffices to show that \(\mathrm{Err}(g(x_i)) \lt
(\epsilon_i/2)^2\). Inverting \(\mathrm{Err}(x)\), we get that
\(x_i = \sqrt{(\epsilon_i + 1) \cdot n}\). Expressing \(g(x_i)\) in
terms of \(\epsilon_i\) we get
\[ g(x_i) = \frac{\sqrt{n}}{2} \left( \frac{\epsilon_i +
2}{\sqrt{\epsilon_i + 1}} \right) \]
and
\[
\mathrm{Err}(g(x_i)) = \frac{(\epsilon_i/2)^2}{\epsilon_i+1}\text{.}
\]
Therefore, it suffices to show that the denominator is greater than
\(1\). But \(x_i \ge s + 1\) implies \(\epsilon_i \gt 0\) by Lemma 3,
so that follows immediately and the result is proved. ∎

Note that in general, an arithmetic operation is not constant-time,
and in fact has run-time \(\Omega(\lg n)\). Since the most expensive
arithmetic operation we do is division, we can say that
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) has run-time that is
both \(\Omega(\lg n)\) and \(O(D(n) \cdot \lg \lg n)\), where \(D(n)\)
is the run-time of dividing \(n\) by some number \(\le n\).[5]

4. The Initial Guess

It’s also useful to show that if the initial guess \(x_0\) is
bad, then the run-time degrades to \(\Theta(\lg n)\). We’ll do
this by defining the function \(\mathrm{N{\small EWTON}\text{-}I{\small
SQRT}'}\) to be like \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\)
except that it takes a function \(\mathrm{I{\small
NITIAL}-G{\small UESS}}\) that is called with \(n\) and assigned to
\(x_0\) in step 1. Then, we can treat \(\epsilon_0\) as a function of
\(n\) and analyze how long \(\epsilon_i\) stays above \(1\) to show
that \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}'}\) takes
\(\Theta(\lg \epsilon_0(n)) + O(\lg \lg n)\) arithmetic
operations.

Proof. First, if \(\epsilon_i \gt 1\),
then \(\epsilon_i \ge 1/n\) and so \(x_i \ge s + 1\), which implies
that the main loop doesn’t terminate with \(x_i\), and thus
\(x_{i+1}\) and \(\epsilon_{i+1}\) exist.

Therefore, \(j = \Theta(\lg \epsilon_0(n))\). Then Theorem 3 can be
adapted to show that only \(O(\lg \lg n)\) more iterations of the loop
are taken. ∎

Since \(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\) uses an
initial guess such that \(\epsilon_0(n) = \Theta(1)\), then Theorem 4
reduces to Theorem 3 in that case. However, if \(x_0\) is chosen to be
\(\Theta(n)\), e.g. the initial guess is just \(n\) or \(n/k\) for
some \(k\), then \(\epsilon_0(n)\) will also be \(\Theta(n)\), and so
the run time will degrade to \(\Theta(\lg n)\). So having a good
initial guess is important for the performance of
\(\mathrm{N{\small EWTON}\text{-}I{\small SQRT}}\)!