An astonishing four out of every 1,000 public keys protecting webmail, online banking, and other sensitive online services provide no cryptographic security, a team of mathematicians has found. The research is the latest to reveal limitations in the tech used by more than a million Internet sites to prevent eavesdropping.

The finding, reported in a paper (PDF) submitted to a cryptography conference in August, is based on the analysis of some 7.1 million 1024-bit RSA keys published online. By subjecting what's known as the "modulus" of each public key to an algorithm first postulated more than 2,000 years ago by the Greek mathematician Euclid, the researchers looked for underlying factors that were used more than once. Almost 27,000 of the keys they examined were cryptographically worthless because one of the factors used to generate them was used by at least one other key.

"The fact is, if these numbers had the entropy that they were supposed to have, the probability of even one of these events happening in 7 million public keys would be vanishingly small," James P. Hughes, an independent cryptographer who participated in the research, told Ars. "We thought that was rather startling."

Following the publication of the paper, and reporting for this article, a separate group of researchers announced a similar finding, but they went on to say that only one of the weak public keys they analyzed was signed by a certificate authority trusted by major browsers. The remainder of the keys were used to secure routers and other embedded devices. More about this second report has been added to the end of this article.

With its discovery in the mid-1970s by Ronald Rivest, Adi Shamir, and Leonard Adleman, RSA cryptography revolutionized secure messaging because it was among the first systems that made it possible for the key needed to decode ciphertext to be held only by the person receiving the private message. RSA is one of the public key cryptographic algorithms used to generate SSL certificates, which are used to encrypt visits to particular websites. For the system to work, however, the underlying RSA modulus must be the product of two very large prime numbers that are unique to each key.

The revelation that such a large proportion of public keys were generated with a prime factor shared by one or more other keys means that such keys are trivial to break by anyone who can identify them. What's more, the percentage of keys known to be generated with non-unique factors is likely to grow as more keys are analyzed. The 0.38 percentage rate of faulty keys found when the researchers looked at 7.1 million total keys compares with a 0.26 percent rate in an earlier analysis that considered only 4.7 million RSA moduli. As a result, the true number of keys that could be broken using the technique may be higher than the current research reveals.

A judgment call

The researchers, led by Dutch mathematician Arjen Lenstra of École Polytechnique Fédérale de Lausanne in Switzerland, said they are releasing their findings ahead of August's conference because they want to alert users of public key cryptography to the presence of so many weak moduli. While it took the team three years to complete the study, they believe it will take peers only a matter of weeks to follow their recipe. Their discovery raised concerns about how to responsibly disclose it without making it easy for others to forge tens of thousands of keys.

"The quagmire of vulnerabilities that we waded into makes it infeasible to properly inform everyone involved, though we made a best effort to inform the larger parties and contacted all e-mail addresses recommended (such as ssl-survey@eff.org) or specified in valid affected certificates," they wrote. "Our decision to make our findings public, despite our inability to directly notify everyone involved, was a judgment call."

They compared their findings to revelations from 2008 that hundreds of thousands, possibly millions, of cryptographic keys generated on systems running Debian Linux were so predictable that an attacker could guess them in a matter of hours.

The Electronic Frontier Foundation's SSL Observatory, which queries every IP address on the Internet for underlying public secure sockets layer certificates, supplied some of the data used in the research. Project leaders don't plan to publish that data until they've had more time to contact parties with weak keys.

"We're currently working around the clock to get notifications to all of the parties that are affected by this," said Peter Eckersley, EFF's technology projects director.

The researchers, however, haven't ruled out the possibility that the large body of weak keys are already known, possibly to nation states or other well organized groups.

"The lack of sophistication of our methods and findings make it hard for us to believe that what we have presented is new, in particular to agencies and parties that are known for their curiosity in such matters," they wrote.

Simplistic or not, the exact method for identifying the keys generated with non-unique factors isn't included in the research paper; neither is the list of affected certificates or key holders. Hughes, who is an independent cryptographer in Palo Alto, California, said the team found a computationally efficient way to flag the keys without having to individually compare each one against all the others.

None of the weak keys they uncovered are used by certificate authorities to sign SSL credentials used by website operators to encrypt traffic and prove that their servers are authentic. That's good news. If a so-called "signing key" was weak, all the certificates it signed would also be trivial to forge.

It remains unclear exactly what is causing large clusters of keys to use duplicated factors. Hughes said that when generation is done correctly for a 1024-bit key, it should theoretically require the generation of 2200 certificates before all possible factors are exhausted. Curiously, the problem of duplicate factors also marred 2048-bit keys, even though they should theoretically provide much more entropy. The researchers searched for similarities among the vulnerable keys for clues about what was causing random number generators to fail during the key generator process, but they were unable to make any determination.

"Our only conclusion is that there is not just one cause for all of these problems," Hughes said. "This leads to our conclusion that unless you can totally trust your random number generator, RSA is not a good algorithm to choose."

He said that other formulas such as Diffie-Hellman and DSA aren't as vulnerable because the duplication of a factor makes a key holder vulnerable only to the person who holds the corresponding certificate. "If you have a collision, you only affect one other person. You can hurt them and they can hurt you, but you haven't made it public to everybody and their mother."

Update

Eric Wustrow, a second-year graduate student in the University of Michigan's Electrical Engineering and Computer Science department, told Ars that research he and other colleagues conducted used a different data set but arrived at similar aggregate findings. However, he said all but one weak key encountered were self-signed. This separate group of researchers decided to publish a blog post summarizing the results out of concern internet users may misinterpret some of the reports about the earlier paper.

"As we say in the title don't panic," Wustrow said. "Not everything over SSL is broken."

He added that the finding that most of weak keys they found were used to protect routers and similar gear suggests the underlying cause may stem from vendors.

"Embedded devices have a history of problems in generating entropy for keys," he said. "We're seeing the same embedded devices from the same manufacturer generating the same primes."

Meanwhile, Hughes, one of the co-writers of the original paper, says he remains convinced that the weak keys represent a threat to people using webmail and e-commerce.

"I hate to say it but this does have implications for web-based commerce because people can mount man-in-the-middle attacks," he said. "People know, for instance, there have been man-in-the-middle attacks mounted against websites by foreign countries. Embedded systems matter to e-commerce because they're the infrastructure that goes between you and the site you're trying to go to."

For a percentage like that, all it takes is a couple CA's that don't understand the difference between /dev/random and /dev/urandom on a server with low entropy, and suddenly a fairly large number of certs are useless. Yet another reason to do away with the CA system...

(edit: I know that this is really a problem with key creation and not CAs directly, but the centralization of cert/key creation can't help this situation.)

An astonishing four out of every 1,000 public keys protecting webmail, online banking, and other sensitive online services provide no cryptographic security, a team of mathematicians has found.

Actually...

The paper's authors wrote:

Don't worry, the key for your bank's web site is probably safe

SSL is used to authenticate every major web site on the Internet, but in our analysis, these were not the keys that were vulnerable to the problems outlined in this blog post.

So which systems are vulnerable? Almost all of the vulnerable keys were generated by and are used to secure embedded hardware devices such as routers and firewalls, not to secure popular web sites such as your bank or email provider. Only one of the factorable SSL keys was signed by a trusted certificate authority and it has already expired. There are signed certificates using repeated keys; some of them are generated by vulnerable devices, some of them are due to website owners submitting known weak keys to be signed, and for some of them we have no good explanation.

You are doing the same thing as the rest of the media is doing, "It's certainly not, as suggested in the New York Times, any reason to have diminished confidence in the security of web-based commerce."

You are doing the same thing as the rest of the media is doing, "It's certainly not, as suggested in the New York Times, any reason to have diminished confidence in the security of web-based commerce."

This was definitely quite an important piece to not leave out! It's been known for quite a while that low-entropy embedded hardware has been an issue, although this is the first quantification of that suckiness I've seen. This is still a concern, but not directly for end users, and irrespective of the Certificate Authority system!

Come on Ars, don't forget to include the likely most important part of some new research!

Come on Ars, don't forget to include the likely most important part of some new research!

Actually, the article does note this:

Quote:

None of the weak keys they uncovered are used by certificate authorities to sign SSL credentials used by website operators to encrypt traffic and prove that their servers are authentic. That's good news. If a so-called "signing key" was weak, all the certificates it signed would also be trivial to forge.

None of the weak keys they uncovered are used by certificate authorities to sign SSL credentials used by website operators to encrypt traffic and prove that their servers are authentic.

If Joe Average Reader even gets that far in the article, he may or may not understand that his banking website is safe after all. Granted, Ars Technica Readers are well above average but still. Given the later down-play, the opening sentence becomes a bit alarmist, bordering on sensational.

Round numbers. People are more comfortable and more familiar working with powers of 10. It also implies a larger sample was used, whereas seeing 1 in 200 could potentially imply a smaller sample. If the sample size were 1000, might have gone with 1/200, but the sample was TWENTY SEVEN thousand, so saying "out of each 1000" makes sense.

In the end, it's mostly preference, neither is more or less correct, since 1/200 and 4/1000 are the same number.

For a percentage like that, all it takes is a couple CA's that don't understand the difference between /dev/random and /dev/urandom on a server with low entropy, and suddenly a fairly large number of certs are useless. Yet another reason to do away with the CA system...

I would be surprised to find a CA that used built-in random source - CA signing key generation should take place in an HSM (hardware security module) that uses a try hardware-based seed for true random number generation. Most, if not all, public CAs will advertise their random number methodology in their certificate practice statement.

I'm not worried about Joe. He's not paying any attention. I'm more concerned with how this will be presented by my local "news" outlet. "OMG! STOP USING THE INTERNETS! GET YOUR SPORTS SCORES FROM US BECAUSE THE INTERNETS IS DANGEROUS!!!" (except for our Facebook page). Now, enjoy weathergirl Barbie and 10 minutes of ads from the somewhat local car dealer and the ambulance chaser lawfirms (not counting the barely masked "news stories" from the local sports bars and garden centers).

Come on Ars, don't forget to include the likely most important part of some new research!

Actually, the article does note this:

Quote:

None of the weak keys they uncovered are used by certificate authorities to sign SSL credentials used by website operators to encrypt traffic and prove that their servers are authentic. That's good news. If a so-called "signing key" was weak, all the certificates it signed would also be trivial to forge.

So I don't think Ars is guilty of just "thinking of the page views".

But still, that note just means the Certificate Authorities aren't affected. It still doesn't tell the reader that normal everyday sites they use (such as GMail, Yahoo Mail, Banking sites, PayPal, etc) *aren't* affected.

You gotta love the EFF for staying out on the forefront of stuff like this, even when other... interested parties might prefer to keep this info in the dark. Folks, if you have a chance to pitch them a buck or two, please do -- they're really fighting the good fight.

For a percentage like that, all it takes is a couple CA's that don't understand the difference between /dev/random and /dev/urandom on a server with low entropy, and suddenly a fairly large number of certs are useless. Yet another reason to do away with the CA system...

Oh dear not this chestnut again. What the hell is wrong with urandom (or other OS equivalent) as long as the PRNG is secure? Nothing. The only advantage constantly reseeding from hardware gives you is obscures your previous state but a good SPRNG should reseed itself through a good one-way function (e.g. as per the updated AES-CTR NIST spec). As long as you reseed occasionally there is no real disadvantage v a hardware source. Of course if you have a very fast hardware generator it makes sense to use it but most servers have limited entropy sources and they are better spent seeding SPRNGs than wasting entropy bits because they are "real" random numbers (or more correctly you hope they are).And if you really think you are against an adversary who can break AES256-CTR then you have far bigger problems than RSA keygen.

P.S.I first came across story on the NYTimes (being from the UK I don't normally read it, but was drawn to it buy some obvious links bait from another site). I expect this sloppy reporting from them not from ARS. This is a science and tech site after all. Some imbedded hardware self generates low entropy primes. Not OMFG 4 in 1000 of all RSA is broken! Anonymous had your mum! Arrggh!

Irrespective of the way primes are selected (additive/sieving methods or methods using fresh random bits for each attempted prime selection), a variety of obvious scenarios is conceivable where poor initial seeding may lead to mishaps, with duplicate keys a consequence if no "fresh" local entropy is used at all. If the latter is used, the outcome may be worse: for instance, a not-properly-random first choice may be prime right away (the probability that this happens is inversely proportional to the length of the prime, and thus non-negligible) and miss its chance to profit from local entropy to become unique. But local entropy may lead to a second prime that is unique, and thus a vulnerable modulus.

"don't roll your own security tools" well that definatly gets a +++ but the blog details that in many cases they didn't (often they were using using OpenSSL or OpenSSH). The problem is that they had no or limited entropy sources in the embeded hardware. It could be a debian style rand()=7 type error somewhere but more likly they just didn't provision the hardware with any good entropy sources. As there would be no keybaord/mouse and many embedded chips have a single clock generator they just up/down scale as needed so no clock jitter to time either. So basically, they would be dependant on system and process IO/CPU/uptime counters that for an OS booting from flash wouldn't change much (between boots or even systems) till you start interacting with the device over the network. Of course the other problem with that kind of network dependant "randomness" is it's not unpredictable to anyone with a packet sniffer.

"...unless you can totally trust your random number generator, RSA is not a good algorithm to choose."

Is James P. Hughes saying here that the people who generated the vulnerable certificates were likely using weak random number generators? As in, true random numbers that are based on biased physical sensors or "underbaked" with inappropriate hashing algorithms; or, that someone out there is selling certificates based on pseudorandom numbers?

The problem is that it's well below the level it should be according to statistics. Given truly random selection of primes for key generation the number of keys with common factors should be way less than was discovered. When you're relying on the fact that you should be secure because the probability of not being secure is so low as to not be a reasonable concern, 0.4% is a damn scary big number.

I thought the whole deal with PKI is that factoring prime numbers is hard. The article says they didn't reveal their methods but if they're able to determine that two keys share a factor with any kind of efficiency isn't that a problem in itself?

Edit: Nevermind, if anyone else is asking the same question as me, read the article everyone is linking. :-)

I thought the whole deal with PKI is that factoring prime numbers is hard. The article says they didn't reveal their methods but if they're able to determine that two keys share a factor with any kind of efficiency isn't that a problem in itself?

Not if the only common factor is 1, as it should be with proper randomization.

If they DO have other factors in common they are found pretty quickly with any good GCD algo.

The issue here is something I discovered long ago in our own Linux embedded products, before we shipped anything mind you

Take a rack of equivalent systems with the same software. Turn them on all at once and log the first stuff out of dev/urandom or dev/random. Repeat many times. The results cluster into bins, 'depending'. There is actually a tiny bit of randomness, just not very much..

It takes a long time before stock Linux will generate divergence on the rack of equipment. Do this same experiment with a home router or the like, repeated millions of times and you will find that groups of them generate the same prime factors.

Seed your PRNG with something that is the same across many reboots/devices and you have a big problem.

Our solution has 3 parts: 1) Seed the PRNG with the Ethernet MAC, RTC date and CPU timebase counter. This *greatly* reduces the chance that any two systems will boot with the same PRNG seed. There is actually a few bits of randomness in the timebase counter. 2) Record in flash a copy of the PRNG state every time the user saves the configuration settings and load it on boot. 3) Seed the PRNG at manufacturing with the output of dev/random from a manufacturing server, and save that in flash. (effectively this is the main entropy used for generating the initial ssh RSA keys during the initial customer installation)

In this way no two systems boot with the same PRNG state. The PRNG state is partly based on 4k of random data saved in flash that is not knowable by anyone. Even if you reboot it again, and again and again, the PRNG will seed differently due to #1 and it is mathematically challenging to predict what effect #1 has when hashed with the 4k unknown blob loaded in #2.

Our high security products all have TPM's in them and random bytes from the TPM are also injected into the PRNG during boot/operation. Of course the TPM can be used for all RSA key generation, etc.

Thank you for pointing out the Freedom to Tinker blog post that provides additional context. To be clear, the post summarizes a completely different study conducted by a completely different set of researchers. It was not available when reporting for the Ars article was being done.

This article has been updated to reflect the important detail that only one of the keys found by the second group was signed by a certificate authority trusted by Firefox and other major browsers.