To Buzzard: I wanted to check if there is any smarter way to do it than Euclid. Is there any reason to think that nothing better is possible?
–
IdonealOct 4 '10 at 9:54

2

@Idoneal: The fast Euclidean algorithm takes time that is quasi-linear in the input size $n=\log q$ (whereas the standard Euclidean algorithm has complexity that is quadratic in $n$). Since linear time is required just to read the input, up to polylogarithmic factors, the fast Euclidean algorithm is optimal.
–
AVSOct 4 '10 at 10:13

What do you mean with "fastest" algorithm? Are you interested in the asymptotic fastest algorithm (i.e., for the bit length of $q$ going to infinity)? Or are you interested in concretely implementing an inversion for numbers not too large (like a few thousand bits)?
–
j.p.Oct 4 '10 at 11:16

4 Answers
4

The fast Euclidean algorithm runs in time $O(M(n)\log n)$, where $n=\log q,$ and $M(n)$ is the time to multiply two $n$-bit integers. This yields a bit-complexity of
$$
O(n\log^2 n\log\log n)
$$
for the time to compute the inverse of an integer modulo $q$, using the standard Schonhage-Strassen algorithm for multiplying integers (a slightly better asymptotic result can be obtained using Furer's multiplication algorithm).

To understand the basic idea behind the fast Euclidean algorithm, recall that the standard Euclidean algorithm computes the (extended) gcd of two integers $r_0 > r_1$ by successively computing $m_i=\lfloor r_{i-1}/r_i\rfloor$ and setting
$$
r_{i+1} = r_{i-1} - m_ir_i,
$$
until it obtains $r_{k+1} = 0$, at which point $r_k = \gcd (r_0, r_1)$. This can be expressed in matrix form as
$$
R_1 = \begin{bmatrix}
r_0\newline
r_1
\end{bmatrix};\qquad
R_{i+1} = \begin{bmatrix}
r_i\newline
r_{i+1}\end{bmatrix} = M_iR_i;\qquad
M_i=\begin{bmatrix}
0&1\newline
1&-m_i
\end{bmatrix},
$$
and if we compute the matrix $S_k=M_kM_{k-1}\ldots M_1$, we have $R_{k+1}=S_kR_1$ which expresses the entry $r_k=\gcd(r_0,r_1)$ as a linear combination of $r_0$ and $r_1$. Assuming this gcd is 1, we can then read off the inverse of $r_1$ modulo $r_0$ (and vice versa) from the top row of $S_k$.

As described above, this involves $O(k)$ arithmetic operations on integers of size $O(n)$, and one can show that $k=O(n)$, leading to a running time that is roughly quadratic in $n$. The fast Euclidean algorithm achieves a quasi-linear running time by only computing $O(\log n)$ of the matrices $S_i$. Roughly speaking (and ignoring many important details), this is done by directly computing $S_{k/2}$ using what is known as a "half-gcd" algorithm, computing $R_{k/2+1}=S_{k/2}R_1$, and then proceeding recursively. The half-gcd algorithm, in turn, works by recursively calling itself. The depth of the recursion is $O(\log n)$ and this yields an $O(M(n)\log n)$ complexity bound.

This algorithm also works over polynomial rings and is often described in this setting. Further details can be found in the (incomplete) list of references below:

Well, I tried reading the reference in Google books but it seems to be somewhat complicated and moreover it is for polynomials. Is it possible explain the main idea behind the fast Euclidean algorithm in a few words or by some simple example?
–
IdonealOct 4 '10 at 10:56

Thanks. The way I understand it, the main idea (due to Lehmer) is carrying out the usual Euclidean algorithm for computing GCD with a little modification coming from the following observation: For computing the partial quotients, we can forget the tails and look at the leading digits. So for computing the GCD of 1234 and 102, first we need to find q and r such that 1234 =102.q+r with 0<r<102. Now, to find q, we might as well divide 10^3 by10^2 which takes less time.
–
IdonealOct 4 '10 at 14:44

@idoneal: Actually, while Lehmer's idea does improve the constant and logarithmic factors in the standard Euclidean algorithm, it still runs in quadratic time. I believe the first subquadratic algorithms are due to Knuth (1970) and Schonhage (1971), who introduced the recursive divide-and-conquer approach that led to the half-gcd algorithm noted above. The Moller reference (to which I have added a link) gives a good summary of the historical development and the current state of the art.
–
AVSOct 4 '10 at 15:21

I see. I haven't understood the half-GCD thing yet. It seems some 2-adic computation is going on.
–
IdonealOct 5 '10 at 6:18

Other than the Euclidean algorithm as described in other answers, you can get time $O(M(n))$ if the modulus is of the form $p^k$, by working successively mod $p$, $p^2$, etc. See the link for details, section 2.5.

If you need to invert several numbers, say $x$ and $y$, it is often faster to invert $xy$ and then calculate $x^{-1} = (xy)^{-1}y$, $y^{-1} = (xy)^{-1}x$.

Instead of going all the way to the GCD with the Euclidean algorithm and working backwards to find a multiplicative inverse, you can go straight to the multiplicative inverse with the Euclidean algorithm.

If p is a prime and a is not divisible by p, perform the Euclidean algorithm on p^2 and ap+1. The first remainder less than p that appears is a multiplicative inverse for a mod p.

I don't know how fast this algorithm is compared to other methods. The number of steps to reach this inverse will be the same as the number of steps to reach GCD(a,p)=1 using the Euclidean algorithm with a and p. But the algorithm proposed above requires a comparison of the remainder at each step with p.