Since I couldn’t find a simple explanation of what happened, I figured I would write one up.

public-key encryption

Public-key encryption is fascinating. You generate a keypair composed of a public and a private key. You post the public key on your web site, and anyone can use it to encrypt data destined for you. Decryption requires the private key, and you keep it to yourself. Anyone can encrypt; only you can decrypt. This is how your web browser secures the communication channel with your bank: the bank publishes its public key, and your browser encrypts data against that public key before sending it to the bank. An eavesdropper knows the bank’s public key, but because they don’t have the private key, they can’t see the data you’re sending.

how do I get me one of them keypairs?

Cryptographic keys are just numbers with special properties. In a public-key encryption system, you typically pick one of these special big numbers randomly, you make that your private key, and from it you compute the public key. It’s easy to compute the public key from the private key. However, given the public key, it’s really, really hard to recover the private key. We don’t know how to do it without spending millions of years of computation for your average public key. That’s right, a single user’s public key, attacked with vast computational power, will not yield its corresponding private key. But if you have the private key, you can get the public key in a few milliseconds. That’s the magic, except it’s not magic, it’s math.

what do you mean by “randomly”?

If you barricade your front door, a thief will probably come in via the window. And so it is with public-key encryption. Attacking a public key directly in order to somehow extract its private counterpart is really, really hard. But maybe it’s not so hard to guess how that private key was selected in the first place. Remember, you have to pick the private key randomly. And, as it turns out, computers are really bad at picking numbers at random (humans are only marginally better.) So, if you’re not careful about how you picked that private key, an attacker might simply reconstruct how you picked it.

lots of people picking lots of keys

Cryptography is everywhere now, so there are millions of public keys made available on the Web. Just go to https://amazon.com, and Amazon will tell you its public key. If a bunch of folks use a not-so-random way to pick their private key, you might expect funny coincidences to happen. Alice in San Francisco and Bob in New York might independently end up with the same private key simply because they both used a similar process for selecting this private key from all possible values. If that happens, they would also have the same public key, and you would be able to easily discover this: just compare their public keys! The researchers found that this happens every now and then: they found a couple dozen public keys that were identical to at least one other public key. In and of itself, that’s kind of fascinating. But it’s not really shocking, right? Clearly, if people have the exact same public key, then they picked their private key poorly.

the funny thing about RSA

The funny thing about RSA, the most common approach to public-key encryption, is that its private key is composed of two numbers, both prime (which means they are divisible only by themselves and 1, for example: 11, 17, 41,…). The public key is then the product of those two primes. As it turns out, it’s really easy for computers to multiply numbers, even really big ones. But if you’re only given the product of two primes, it’s incredibly hard to recover those two factors. For example, take two primes each 200 digits long, multiply them together to get a 400-digit long number, and give that to a friend. Given all the computing power in the world for many lifetimes, your friend will not be able to recover those two prime numbers you initially picked.

Now, there are lots and lots of prime numbers. So many, in fact, that if you and I randomly select a 200-digit prime number, there is no conceivable chance we’d pick the same one. But, what if we don’t do it randomly? What if we both start out with 1 followed by 199 0s, and work our way up until we find the first prime number? Then of course we’d end up with the same one. Now maybe we’re not so stupid, and we have a clever way of picking a much more complex starting point, and then working our way up to find the next prime. Well, let’s hope we don’t both use the same clever method, because no matter how clever it is, if we both use the same method, we’re going to end up with the same prime.

So, back to the funny thing about RSA: because the private key is made up of two prime numbers, if people don’t choose those prime numbers randomly, then two different people might end up with one prime number in common, but not the other. So their public keys won’t be exactly the same: one will be p1 x p2, and the other will be p1 x p3. So it won’t be immediately obvious that we used poor randomness.

And now the final piece of fun math. It’s really hard to factor numbers, and it’s really easy to multiply them. Another thing that’s really easy is to find common factors between two numbers. So if I have two RSA public keys that share a prime factor, it’s really easy to determine that common prime factor. And then, with that prime factor, it’s easy to discover the other prime factor in each of the two keys. So, one RSA public key is very hard to break, but two RSA public keys that share a prime factor are trivial to break together.

So that’s what the researchers did. They looked at every pair of RSA public keys and found that 0.2% of them share a prime factor with another. Given that, they were able to fully factor those 0.2% of keys, and thus completely break their security.

This really shouldn’t be that much more shocking than the case where users have the exact same public key. It’s just that, with RSA, there is another way in which poor randomness could result in weak keys, without those keys being exactly identical. It’s fascinating, and it’s a great study, but the root cause is no different: it’s all about the randomness.

so other approaches are better?

No. This attack has nothing to do with RSA. It has everything to do with randomness. No matter the algorithm you pick for public-key encryption, you have to find a really good source of randomness to pick your private key. The cute thing here is that weak randomness was revealed in a new surprising way, because RSA public keys can share a prime factor without being immediately obviously identical. That’s cool, but it’s not a weakness of RSA.

how do I fix my code?

Make sure you’re using a secure random number generator to generate your keys. Make sure you’ve seeded it with good randomness, using operating-system calls if possible. And mostly, don’t panic. There’s no new attack here, only a very interesting revelation, using a very interesting trick, that a lot of people don’t pay sufficient attention to randomness when generating crypto keys.

10 thoughts on “it’s the randomness, stupid”

That’s not really RSA anymore. Remember, whatever you do, the core issue still applies: bad randomness is still bad randomness. If you can predict the random-number generation process used by the key generator, you have a weak private key.

The research report RSA Public key.RSA public keyfound that 0.2% of them share a prime factor with another. Given that, they were able to fully factor those 0.2% of keys, and thus completely break their security.

This reminds me of something I read recently: some time ago, the Debian guys found some place in the Mozilla core code where uninitialized memory was being read. So they “fixed” it. The problem is, Mozilla was doing it on purpose, to improve randomness, and the Debian “fix” made it easy to crack some keys by brute force with present-day processors. (IIUC, since then Debian reversed their “fix”.)

Once you have one of the factors, it’s much easier to find the other ones (since you can divide it out). Because of that, it’s better to double the size of the numbers than double the number of numbers.

Right on, well spoken. Weak randomness is a common way cryptanalysis attacks encryption precisely because modern crypto is built on top of well-proven complex mathematical problems (the discrete logarithm problem, for example, or the factorization of large numbers). Attacking those head-on is not impossible, but at the very, very least nobel-prize-worthy. I remember a bug in Debian a while back where SSL certs were generated with very week “random” seeds, resulting in those certificates being essentially useless. Surprise: Once again it was the randomness of the “ingredients”, not the cryptographic algorithm that was at fault.

Ironically, most modern operating systems have good sources of randomness (that is, entropy). For example from the timing of interrupts. Plus, ideally they calculate the entropy on the fly and will refuse to give you bad randomness. So there’s no excuse for weak randomness.