Thursday, March 24, 2011

A brief introduction to web "certificates"

In case you are confused by SSL, and don’t fully understand the recent Comodo hack, I thought I’d write up a brief explanation for you. This is drastically simplified. I’m skipping a lot of steps in the process. I’m just trying to explain the essentials without getting lost in the details.

The company Comodo is what’s called a “Certificate Authority”. A hacker tricked them into issuing “certificates” for companies like Microsoft and Google. This would allow anybody who could tap the network between you and those websites to decrypt otherwise encrypted traffic. This somebody would have to somebody in-line with the network, like a hacker next to you in a coffee shop, or the Iranian government wiretapping the ISP.

This document explains how SSL is protects against such attacks, and how the bogus Comodo certificates defeat those protections.

Public key cryptography

You know how normal cryptography works. Let’s say you want to send a file to your friend, but you know the FBI is spying on your e-mails. Therefore, you use WinZIP to zip up the file and give it a secret password like “xyz1234PDQ!!!abc”, then e-mail the zip file to your friend. You then call your friend and tell him the password.

That will probably work, except for the fact that the FBI is also tapping the phone, so hears the same password and thus decrypts the file anyway. The problem is that you have to send the password as well as the file -- and there is no way to send the password without the FBI getting it.

What you’d really like is some way to solve the problem so that even if the FBI is eavesdropping on all your communications, they still can’t decrypt things.

There is a way to do this. What you do is start with a secret, but from that secret, you derive two keys using related mathematical functions. A file encrypted with one key can only be decrypted with the other key (and vice versa). In other words, you can’t decrypt the file with the same key you used to encrypt it, you must use the other (mathematically related) key. Moreover, knowing one key does not help you figure out the other key -- you’d have to know the secret that both keys were derived from.

Your friend uses this technique, and generates the two keys. He keeps one private, and sends you the other one. Since the FBI is listening in, that key has become essentially public. You use this “public” key to encrypt (using PGP) the file, and send the file back to your friend. The FBI is out of luck -- they have both the encrypted file and the public key that was used to encrypt it, but they can’t decrypt it. Only your friend, with the related “private” key, can decrypt the file.

This is how SSL works. When you visit https://mail.google.com, the first thing the server does is send you a public key. You then use that public key to encrypt the data that you send to the server. Eavesdroppers, such as those pesky hackers monitoring the WiFi or Iranian secret police, see both the public key and encrypted data, but they can’t decrypt it. Only Google, who knows their private key, can decrypt the data.

Man in the middle

But there is a way around public/private keys. In a coffee shop, the hacker sets up his own access point called “attwifi”, “tmobile”, or something similar. Your notebook computer connects to the hacker instead of the real access point. The hacker intercepts your attempted connection to https://mail.google.com. Claiming to be Google, the hacker sends you back his own public key.

You then use the hacker’s key, not Google’s key, to encrypt the communication, which the hacker then decrypts with his related private key. The hacker then establishes a connection to the real Google, pretending to be you. The hacker relays traffic back and forth between you and Google, decrypting it as it arrives on one end, and re-encrypting it when transmitting it out the other end.

Such things really happen. Recently, in Tunisia (shortly before the protests took down the government), the government did exactly that, man-in-the-middle interception in order to steal people’s passwords. People suspect Iran is trying to do this now.

In order to solve this “man-in-the-middle” problem, there must be a way to know the real Google from somebody pretending to be Google. That’s where certificates come in.

Certificates

Keys can also encrypt other keys. We can use this trick to figure out whether a key is valid, or made up by a hacker.

Here’s what Microsoft can do with Internet Explorer. They can encode a public key directly in the browser. Then, companies like Google can ask Microsoft to encrypt their public keys with the related Microsoft private key. Microsoft verifies that Google is who they say they are, and obliges, encrypting Google’s key.

Then, when you go to https://mail.google.com, the Google server sends you that encrypted key. Your browser first uses Microsoft’s public key in Internet Explorer to decrypt Google’s key, then uses Google’s public key to encrypt the session, as described above.

That’s fine -- except for the fact that Microsoft doesn’t want to be in the business of vetting everybody and encrypting their keys. Instead, Microsoft hands the job off to other companies called “Certificate Authorities”, like Verisign, Thawte, GTE, or Comodo. These companies check you out, make sure you own a domain name, and then encrypt your public key for you using their private key. Internet Explorer (and Firefox, Chrome, Safari, etc.) then include the public keys of the Certificate Authorities in the browser.

The following is a picture from Firefox, showing the long list of Certificate Authorities that Firefox trusts:

If I click on Comodo, I can see the public key. Note that it’s a long binary string: you never type these in as you would a password, which is why they are transfered in files called “certificates”.

There is a chain of trust here. Internet Explorer (or Firefox etc.) trusts a list of “Certificate Authorities”. Those companies then verify the owner of a website in fact owns the website, then encrypts their key. When you establish an encrypted connection to the website, and it gives you a public key, you trace the chain backward to make sure it’s trustworthy, and not a hacker (or evil government) trying to do a man-in-the-middle attack against you.

Here is an example of the process in action. When I log onto Gmail, I get an SSL connection to Google to log in. I can click on left of the URL and see that the Certificate Authority named “Thawte” has verified that this is indeed Google, and not a hacker or the Iranian government trying to hack me. Thawte is one of the many Certficate Authorities trusted by the browser, along with Comodo.

If I click on “More Information...”, and can view the contents of the certificate to see what’s really going on, such as the public key that was used to encrypt the connection:

Where is all goes wrong

The problem is that the Certificate Authority doesn’t want to deal with you, either. Frankly, it costs more to verify your identity than they can charge for you certificate.

Instead, they’d rather go through a reseller, such as your hosting provider. Your hosting provider probably registered your website name for you in the first place, so they know that you, in fact, own that name. They don’t need to go through any extra work to verify it. And thus, they sell you a certificate cheaper than you could get it elsewhere.

There’s two ways to do this. One way is simply to add another link in the trust chain. A Certificate Authority (say, “Thawte") encrypts the public key of your hosting provider (say, “Bob’s Hosting”), who then encrypts your public key.

People visiting your website then follow a longer chain backwards, first to Bob’s Hosting, then to Thawte, then finally to the browser. These chains can become indefinitely long. You just follow the chain upward until you find somebody you trust (such as the web browser).

The other way is through “Registration Authorities”. Instead of going through the hassle of securing private keys (which is a lot of work), resellers like Bob’s Hosting can do only the easy part: verifying your identity. Once they’ve done that, they send the request up to the Certificate Authority, who then signs the key. It’s an invisible step in the chain: you don’t see Bob’s Hosting in your web site’s certificate, but it was a trusted link in the chain, because the Certificate Authority trusted Bob’s Hosting to verify your identity.

But what happens when a hacker breaks into Bob’s Hosting? The hacker can use the account with the Certificate Authority and claim to have customers named Google, Microsoft, and Yahoo. The Certificate Authority would then blindly generate the certificates.

And that’s what happened here: Comodo is a Certificate Authority, and one of their resellers (described as “a company in southern Europe") was compromised by a hacker, and the account was used to generate these bogus certificates.

Revocation Lists

In theory, this means nobody can ever go to Google again, because they can’t trust that it’s not the Iranian government hacking them using a bogus certficate. However, there is a way to sovle this: revocation lists. Certificate Authorities can publish lists of certificates that are fraudulent or otherwise invalid. Periodically, your browser goes to various servers to see if any certificates have been revoked, and when it sees them, it stops using the certificates.

That’s how the Comodo hack came to light. Somebody (Jacob Applebaum) noticed that Comodo had revoked these nine certificates, and investigated why.

The reason Jacob Applebaum investigated this was on the unrelated matter to prove that revocation lists don’t really work. As it turns out, browses aren’t checking revocation lists as diligently as they should, and hackers doing a “man-in-the-middle” attack can just as easily intercept and change the revocation lists.

Better yet, the man-in-the-middle block access to the revocation servers -- browsers assume that if they can’t reach a revocation server, then the certificate must be valid.

Conclusion

Encrypted web traffic (SSL aka HTTPS) only works if the chain of trust is unbroken. If bogus certificates can be generated, the chain is broken, and hackers can intercept traffic.

Missing details: I've left a ton of details out. For example, you don't use the "public key" to encrypt web traffic -- you instead generate a random session keys, then encrypt the session keys and exchange them. Also, Certificate Authorities don't encrypt other keys, they "sign" them. This means taking a unique hash of the information (public key plus website name), then encrypt the hash.