Factoring large numbers is hard. Unfortunately for algorithm designers, it is getting easier. Even worse, it is getting easier faster than mathematicians expected. In 1976 Richard Guy wrote: "I shall be surprised if anyone regularly factors numbers of size 10^80 without special form during the present century." In 1977 Ron Rivest said that factoring a 125-digit number would take 40 quadrillion years. In 1994 a 129-digit number was factored. If there is any lesson in all this, it is that making predictions is foolish.

Table 1 shows factoring records over the past dozen years. The fastest factoring algorithm during the time was the quadratic sieve.

These numbers are pretty frightening. Today it is not uncommon to see 512-bit numbers used in operational systems. Factoring them, and thereby completely compromising their security, is well in the range of possibility: A weekend-long worm on the Internet could do it.

Computing power is generally measured in mips-years: a one- million-instruction-per-second computer running for one year, or about 3*10^13 instructions. By convention, a 1 mips machine is equivalent to the DEC VAX 11/780. Hence, a mips-year is a VAX 11/780 running for a year, or the equivalent.

The 1983 factorization of a 71-digit number required 0.1 mips- years; the 1994 factorization of a 129-digit number required 5000. This dramatic increase in computing power resulted largely from the introduction of distributed computing, using the idle time on a network of workstations. The 1983 factorization used 9.5 CPU hours on a single Cray X-MP; the 1994 factorization used the idle time on 1600 computers around the world for about 8 months. Modern factoring methods lend themselves to this kind of distributed implementation.

The picture gets even worse. A new factoring algorithm has taken over from the quadratic sieve: the general number field sieve. In 1989 mathematicians would have told you that the general number field sieve would never be practical. In 1992 they would have told you that it was practical, but only faster than the quadratic sieve for numbers greater than 130-150 digits or so. Today it is known to be faster than the quadratic sieve for numbers well below 116 digits. The general number field sieve can factor a 512-bit number over 10 times faster than the quadratic sieve. The algorithm would require less than a year to run on an 1800-node Intel Paragon. Table 2 gives the number of mips-years required to factor numbers of different sizes, given current implementations of the general number field sieve.

And the general number field sieve is still getting faster. Mathematicians keep coming up with new tricks, new optimizations, new techniques. There's no reason to think this trend won't continue. A related algorithm, the special number field sieve, can already factor numbers of a certain specialized form--numbers not generally used for cryptography--must faster than the general number field sieve can factor general numbers of the same size. It is not unreasonable to assume that the general number field sieve can be optimized to run this fast; it is possible that the NSA already knows how to do this. Table 3 gives the number of mips-years required for the special number field sieve to factor numbers of different lengths.

At a European Institute for System Security workshop in 1992, the participants agreed that a 1024-bit modulus should be sufficient for long-term secrets through 2002. However, they warned: "Although the participants of this workshop feel best qualified in their respective areas, this statement [with respect to lasting security] should be taken with caution." This is good advice.

The wise cryptographer is ultra-conservative when choosing public-key key lengths. To determine how long a key you need requires you to look at both the intended security and lifetime of the key, and the current state-of-the-art of factoring. Today you need a 1024-bit number to get the level of security you got from a 512-bit number in the early 1980s. If you want your keys to remain secure for 20 years, 1024 bits is likely too short.

Even if your particular secrets aren't worth the effort required to factor your modulus, you may be at risk. Imagine an automatic banking system that uses RSA for security. Mallory can stand up in court and say: "Did you read in the newspaper in 1994 that RSA-129 was broken, and that 512-bit numbers can be factored by any organization willing to spend a few million dollars and wait a few months? My bank uses 512-bit numbers for security, and by the way I didn't make these seven withdrawals." Even if Mallory is lying, the judge will probably put the onus on the bank to prove it.

Earlier I called making predictions foolish. Now I am about to make some. Table 4 gives my recommendations for public-key lengths, depending on how long you require the key to be secure. There are three key lengths for each year, one secure against an individual, one secure against a major corporation, and the third secure against a major government.

Here are some assumptions from the mathematicians who factored RSA-129:

We believe that we could acquire 100 thousand machines without superhuman or unethical efforts. That is, we would not set free an Internet worm or virus to find resources for us. Many organizations have several thousand machines each on the net. Making use of their facilities would require skillful diplomacy, but should not be impossible. Assuming the 5 mips average power, and one year elapsed time, it is not too unreasonable to embark on a project which would require half a million mips years.

The project to factor the 129-digit number harnesses an estimated 0.03% of the total computing power of the Internet, and they didn't even try very hard. It isn't unreasonable to assume that a well-publicized project can harness 0.1% of the world's computing power for a year.

Assume a dedicated cryptanalyst can get his hands on 10,000 mips- years, a large corporation can get 10^7 mips-years, and that a large government can get 10^9 mips-years. Also assume that computing power will increase by a factor of ten every five years. And finally, assume that advances in factoring mathematics allows us to factor general numbers at the speeds of the special number field sieve. Table 4 recommends different key lengths for security during different years.

Remember to take the value of the key into account. Public keys are often used to secure things of great value for a long time: the bank's master key for a digital cash system, the key the government uses to certify its passports, a notary public's digital signature key. It probably isn't worth the effort to spend months of computing time to break an individual's private key, but if you can print your own money with a broken key the idea becomes more attractive. A 1024-bit key is long enough to sign something that will be verified within the week, or month, or even a few years. But you don't want to stand up in court twenty years from now with a digitally signed document, and have the opposition demonstrate how to forge documents with the same signature.

Making predictions beyond the near future is even more foolish. Who knows what kind of advances in computing, networking, and mathematics are going to happen by 2020? However, if you look at the broad picture, in every decade we can factor numbers twice as long as in the previous decade. This leads to Table 5.

Not everyone will agree with my recommendations. The NSA has mandated 512-bit to 1024-bit keys for their Digital Signature Standard--far less than I recommend for long-term security. PGP has a maximum RSA key length of 1280 bits. Lenstra, the world's most successful factorer, refuses to make predictions past ten years. And Table 6 gives Ron Rivest's key-length recommendations, originally made in 1990, which I consider much too optimistic. While his analysis looks fine on paper, recent history illustrates that surprises regularly happen. It makes sense to choose your keys to be resilient against future surprises.

Low estimates assume a budget of $25,000, the quadratic sieve algorithm, and a technology advance of 20% per year. Average estimates assume a budget of $25 million, the general number field sieve algorithm, and a technology advance of 33% per year. High estimates assume a budget of $25 billion, a general quadratic sieve algorithm running at the speed of the special number field sieve, and a technology advance of 45% per year.

There is always the possibility that an advance in factoring will surprise me as well, but I think that unlikely. But why trust me? I just proved my own foolishness by making predictions.

> A new factoring algorithm has taken over from the quadratic
> sieve: the general number field sieve. In 1989
> mathematicians would have told you that the general number
> field sieve would never be practical.

> In 1992 they would have told you that it was practical, but
> only faster than the quadratic sieve for numbers greater
> than 130-150 digits or so. Today it is known to be faster
> than the quadratic sieve for numbers well below 116 digits.

> The general number field sieve can factor a 512-bit number
> over 10 times faster than the quadratic sieve.

The GNFS situation is a little bit more complicated than that. Today's factoring algorithms work by finding distinct square roots of the same quadratic residue modulo the number to be factored. Each such discovery yields an approximately 50% chance of factoring the number using Euclid's GCD algorithm.

Since searching directly for a congruence of squares would be grossly inefficient, one sieves for relations involving arbitrary powers of numbers from a set called the "factor base", and after collecting an overdetermined number of such relations, finds their null space modulo two in huge matrix operation. This yields a relation whose powers are all even, from which a congruence of squares may be constructed in an obvious manner.

Most popular factoring methods including NFS, GNFS, and numerous flavors of QS utilize this general scheme, and differ only in the numbers to which they are applicable and in the methods that they use to fish for relations in more densely populated mathematical waters.

GNFS uses a particularly cute trick, which is to express the number being factored as a polynomial with small coefficients, evaluated at a small argument. On can then construct a homomorphism from a ring of algebraic integers into Z/nZ. This permits sieving to be conducted in a particularly efficient fashion.

Finding such a polynomial, unfortunately, is a far from straightforward task. Current state of the art is to start with a guess, and flog it to death on a workstation for several days, attempting some sort of stepwise refinement. Although the problem is mathematically rich, no systematic method currently exists to pick the "best" polynomial, and the problem of doing so may be of a difficulty comparable to factoring.

The speed with which GNFS runs and the degree to which it outperforms QS is extremely sensitive to the polynomial chosen, so the blanket statement that GNFS outperforms QS by a factor of 10 on 512 bit numbers is in my opinion, a bit of an oversimplification.

GNFS is one of the most complicated computer algorithms to be constructed, sieving and factoring simultaneously in both a ring of algebraic integers and in Z/nZ. The algorithm has been known to experience "cycle explosions" in which unexpectly large amounts of raw data are produced from relatively small numbers. It is certainly not something that can be regularly run in "production mode" and it requires a skilled operator (currently its creator) to help it coast smoothly through its various stages.

I don't think GNFS is going to be available in shrink-wrapped form for quite some time. :)

> A related algorithm, the special number field sieve, can
> already factor numbers of a certain specialized
> form--numbers not generally used for cryptography--must
> faster than the general number field sieve can factor
> general numbers of the same size.

NFS and GNFS are essentially the same algorithm. NFS is simply a special case where a particularly simple polynomial is known, Z[a] is a unique factorization domain, and some other nice algebraic properties are present. In the case of a general integer, and a more complex polynomial, some things get messier.

> It is not unreasonable to assume that the general number
> field sieve can be optimized to run this fast; it is
> possible that the NSA already knows how to do this.

I think this is unlikely. The difference in speed is due to the fact that NFS only factors specially chosen simple numbers, and GNFS factors anything. That is not something that is likely to be optimized away.

Also, I think we make far to much of the magical ability of the NSA to do things. At the present point in time, most of the cryptomathematical expertise in the world is external to the NSA. The NSA didn't invent GNFS, or for that matter, public key cryptography.

> Making predictions beyond the near future is even more
> foolish. Who knows what kind of advances in computing,
> networking, and mathematics are going to happen by 2020?
> However, if you look at the broad picture, in every decade
> we can factor numbers twice as long as in the previous
> decade.

GNFS probably represents the final step in the evolution of the "combination of congruences" factoring methods. Further refinements would probably be such complicated algorithms as to be inpractical to program.

Additional improvements in our ability to break RSA will probably come via some new factoring scheme that we are presently unaware of, or via a method of computing the inverse of the encryption permutation used by RSA which does not require explicit formation of the factors of the modulus.