Introduction

Modular multiplicative inversion is useful in elliptic curve cryptography (ECC)
for a variety of reasons. Given 0 <= x < p where *p*
is prime, one common way of calculating the modular multiplicative inverse
x−1 (mod p) makes use of Fermat's Little
Theorem, in particular the equality
x−1 = xp − 2 (mod p).
As [Donald Knuth] explained in Section 4.6.3 of
[The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd Edition)],
given an addition chain for an exponent *e*, we can calculate
xe (mod p) by using squaring
a2 (mod p) as the doubling operation and
multiplication a × b (mod p) as the addition
operation. The correctness & efficiency of such a computation depends (only) on
the correctness & efficiency of the implementations of modular squaring &
modular multiplication and the correctness & efficiency of the addition chain
used for the exponentiation.
Assuming we've already verified & optimized the modular multiplication &
squaring implementations, we'll focus here on finding the most efficient
addition chains useful for inversion in commonly-used elliptic curves. The
curves we are interested in here are Curve25519,
P-256 (secp256r1),
P-384 (secp384r1), and secp256k1 (the curve used in
Bitcoin). The notable characteristic of this use of addition chains is that
the exponent is a well-known constant, so we can construct the addition chain
offline using as much effort as we are willing to expend. In contrast, most
other practical uses of addition chains require that the addition chain be
constructed online for a previously-unknown value and so the cost of
constructing the chain has to be added to the cost of evaluating the chain;
that complication makes that problem quite different from what's discussed
below.
We have no proofs that the addition chains presented here are optimal. In
fact it is almost certainly the case that at least a couple of the ones
presented here are not optimal. Your help is requested in finding
better ones. Please email me at
brian@briansmith.org with (links to)
better addition chains and/or links to tools and algorithms for finding better
addition chains for these very large values. I will publish updates here as
better chains are found. Similarly, any help in finding a better accounting
of the history of the search for optimal addition chains for these curves
would be greatly appreciated. Let me know if there are other curves that are
ubiquitous that should have inversion addition chains documented here.
All of the addition chains documented here are written in the notation
developed in [Addition Chains as Polymorphic Higher-order Functions].
In particular, this notation is executable Haskell code that forms the
module `ECCInversionAdditionChains` in the file
[ECCInversionAdditionChains.lhs](ECCInversionAdditionChains.lhs). With this,
alongside the module `AdditionChainComputation` in
[AdditionChainComputation.lhs](AdditionChainComputation.lhs), we can
automatically verify the correctness of these addition chains and measure
the efficiency of each. This code only uses library functions from the base
package and it does not enable any language extensions. It was tested in
GHCi 8.0.1.
We use some naming conventions to make the code easier to read. A variable
named with a leading underscore has the value represented by interpreting
the part of the name after the underscore as a binary number. For example,
`_101` represents the number 5 since 1012 = 5. We will deal with
numbers that consist of a long series of binary 1 digits, and we'll name
these xn where n is the number of ones.
For example, `x3` represents 1112 which is 7 in decimal notation;
as another example, `x16` represents 11111111111111112 which is
65535 (FFFF16). With that in mind, let's consider an example step
``_110111 = (nth double 4 `andThen` add _111) _11``. This says that 55
(`_110111`) is constructed from 3 (`_11`) by doubling it four times
(`nth double 4`) and then (`andThen`) adding 7 (`_111`); that is,
55 = 3×2×2×2×2 + 7.
> module ECCInversionAdditionChains(
> curve25519FieldInverseExponent, curve25519FieldInverseExponentValue,
>
> p256FieldInverseSquaredExponent, p256FieldInverseSquaredExponentValue,
> p384FieldInverseSquaredExponent, p384FieldInverseSquaredExponentValue,
> inverseCubedFromInverseSquared, inverseFromInverseSquared,
>
> secp256k1ScalarInverseExponent, secp256k1ScalarInverseExponentValue,
> p256ScalarInverseExponent, p256ScalarInverseExponentValue,
> p384ScalarInverseExponent, p384ScalarInverseExponentValue,
> curve25519ScalarInverseExponent, curve25519ScalarInverseExponentValue,
>
> invalid,
> ) where
> import AdditionChainComputation
> import Numeric.Natural

Curve25519 Field Element Inversion

Scalar multiplication on the Curve25519 curve require inverting a Z coordinate
when converting a non-affine point representation to an affine point
representation. This involves calculating
z−1 = zq − 2 (mod q),
where q is Curve25519's field characteristic,
2255 − 19. [Daniel. J. Bernstein] described
an addition chain for that in
[Curve25519: new Diffie-Hellman speed records] as “a straightforward
sequence of 254 squarings and 11 multiplications.” The addition chain shown
here is the one described, reverse-engineered from the function
`curve25519_athlon_recip` in the code bundle
[curve25519-20050915.tar.gz](https://cr.yp.to/ecdh/curve25519-20050915.tar.gz)
he published alongside the paper. Both the code and the paper are available
from [the Curve25519 page on his website](https://cr.yp.to/ecdh.html).

P-256 (secp256r1) Field Element Inversion

P-256 implementations usually use Jacobian coordinates
and then convert the Jacobian coordinates to affine coordinates at the end of
the computation. This conversion requires calculating
z−2 (mod q) for the X coordinate. When
the affine value of the Y coordinate is needed (which is usually, but not
always, the case) then z−3 (mod q) is also
needed.
Shay Gueron and Vlad Krasnov published an efficient addition chain in the
OpenSSL patch that accompanied the paper [Fast Prime Field Elliptic Curve
Cryptography with 256 Bit Primes] that is quite similar in overall
structure to the one for Curve25519 above. They noted that when the Y
coordinate isn't needed, we can compute
z−2 = zq − 3 (mod q) in one
fewer multiplication than is required in their addition chain for
z−1 (mod q). The last step of the
addition chain for z−1 (mod q) is a
multiplication by z, which means that before that multiplication, we must
have had z−2 (mod q) since
z−2 × z = z−1 (mod q).
> inverseFromInverseSquared add one inverseSquared =
> add one inverseSquared
They didn't actually implement this optimization in their OpenSSL patch.
Actually the optimization of directly computing
z−2 (mod q) using the addition chain for
q − 3 is useful even when
z−3 (mod q) is also needed, as we can
easily compute
z−4 = (z−2)2 (mod q)
and then compute
z−3 = z−4 × z (mod q):
> inverseCubedFromInverseSquared double add one inverseSquared =
> (double `andThen` add one) inverseSquared
Also, I found an addition chain for the earlier part of
p − 3 that further reduces the number of adds by one.
The resulting addition chains save one double (squaring) and two
adds (multiplications) when the Y coordinate is not needed, and it saves two
adds (multiplications) when the Y coordinate is needed, relative to the one
implemented by Gueron & Krasnov for OpenSSL.

P-384 (secp384r1) Field Element Inversion

I could not find any addition chains for field element inversion for the
P-384 curve, so I made my own using the same ideas as
the one used for P-256. This addition chain calculates
the square of the inverse; the inverse and the cube of the inverse can be
recovered using the same logic presented for P-256
above.

secp256k1 (Bitcoin's Curve) Field Element Inversion

Peter Dettman [contributed an optimized addition chain to the libsecp256k1
project](https://github.com/bitcoin-core/secp256k1/commit/f8ccd9befdb22824ef9a845a90e3db57c1307c11)
back in January of 2014 for secp256k1 field element inversion (and modular
square root). His addition chain directly calculates the inverse
z−1 = zq − 2 (mod q). The
addition chain here is that one, modified only by omitting the final
multiplication by z (mod q) so that it calculates
z−2 = zq − 3 (mod q), following
the same logic as the addition chains for P-256 and
P-384 field element inversion presented above.

secp256k1 (Bitcoin's Curve) Scalar Inversion

ECDSA signing for the secp256k1 curve requires scalar inversion, i.e.
computing
k−1 = kn − 2 (mod n) where n is
the curve's group order. Pieter Wuille implemented an addition chain for
scalar inversion for this curve based (AFAICT) on the same idea as Peter
Dettman's addition chain for field element inversion above, using sliding
windows 1, 112, and 11112. Recently in 2017
Peter
Dettman improved it by also making use of the windows 1012 and
1112. When I stumbled across that change I suspected that 3-bit
windows were unlikely to be optimal. I then
improved
it by making use of all the previously-unused four-bit windows
10012, 10112, and 11012, resulting in the
addition chain here. That improvement relied on constructing all odd numbers
between 3 and 13 by constructing 1 + 1 = 2 and then 1 + 2 = 3, and then
repeatedly adding 2, which is a well-known technique that is often overlooked.
Because so little effort has been expended on optimizing this so far, I think
it would probably be relatively easy to further improve it.

P-256 (secp256r1) Scalar Inversion

ECDSA signing for the P-256 curve also requires scalar inversion, which
again is the
computation of
k−1 = kn − 2 (mod n) where n is
the curve's group order. [Vlad Krasnov published C code in April
2015](https://github.com/vkrasnov/openssl/tree/vlad/inv_ord)
that implements an addition chain that used the standard bit duplication
technique for the easy part at the beginning and then fixed windows (where
each window is one 4-bit hex digit) for the hard part at the end. In 2015 I
improved his addition chain by switching from fixed windows to sliding windows
and then further improved it by looking for longer windows that were a net
benefit to construct. In 2016 and again in 2017 I improved it further. Recently
after improving the addition chain for secp256k1's n − 2,
I realized that the technique of constructing small odd multiples of k by
repeatedly adding 2 is also useful, to a limited extent, for this curve's n − 2.

P-384 (secp384r1) Scalar Inversion

ECDSA signing for the P-384 curve also requires scalar inversion, i.e.
computing
k−1 = kn − 2 (mod n) where n is
the curve's group order. I could not find any previous attempts to construct an
optimal addition chain for this number, so I made one based on the same ideas as
the one for scalar inversion in P-256. Although I have improved it iteratively
over time, I have not spent a significant effort optimizing this. Consequently,
of all the addition chains described here, this would probably the easiest one
to improve and it is probably the one that can be improved the most.

Curve25519 Scalar Inversion

[Back-Maxwell Rangeproofs](https://blockstream.com/bitcoin17-final41.pdf) require
scalar inversion, and some people are interested in implementing them using
Curve25519. Curve25519 isn't a prime order curve, but it is sufficient to
restrict the domain and range of the inversion to the largest (prime order)
subgroup so that Fermat's Little Theorem can be used. All the implementations
of this I've seen up to this point have used a simple one-bit
square-and-multiply ladder that requires 251 squarings and 72 multiplications.
The addition chain here requires only 250 squarings and 34 multiplications.
Again, very little time has been spent optimizing this addition chain, and I
expect improvements should be easy to find.