The First Few Milliseconds of an HTTPS Connection

In the 220 milliseconds that flew by, a lot of interesting stuff happened to make Firefox change the address bar color and put a lock in the lower right corner. With the help of Wireshark, my favorite network tool, and a slightly modified debug build of Firefox, we can see exactly what’s going on.

By agreement of RFC 2818, Firefox knew that “https” meant it should connect to port 443 at Amazon.com:

Most people associate HTTPS with SSL (Secure Sockets Layer) which was created by Netscape in the mid 90’s. This is becoming less true over time. As Netscape lost market share, SSL’s maintenance moved to the Internet Engineering Task Force (IETF). The first post-Netscape version was re-branded as Transport Layer Security (TLS) 1.0 which was released in January 1999. It’s rare to see true “SSL” traffic given that TLS has been around for 10 years.

Client Hello

TLS wraps all traffic in “records” of different types. We see that the first byte out of our browser is the hex byte 0x16 = 22 which means that this is a “handshake” record:

The next two bytes are 0x0301 which indicate that this is a version 3.1 record which shows that TLS 1.0 is essentially SSL 3.1.

The handshake record is broken out into several messages. The first is our “Client Hello” message (0x01). There are a few important things here:

Random:
There are four bytes representing the current Coordinated Universal Time (UTC) in the Unix epoch format, which is the number of seconds since January 1, 1970. In this case, 0x4a2f07ca. It’s followed by 28 random bytes. This will be used later on.

Session ID:
Here it’s empty/null. If we had previously connected to Amazon.com a few seconds ago, we could potentially resume a session and avoid a full handshake.

Cipher Suites:
This is a list of all of the encryption algorithms that the browser is willing to support. Its top pick is a very strong choice of “TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA” followed by 33 others that it’s willing to accept. Don’t worry if none of that makes sense. We’ll find out later that Amazon doesn’t pick our first choice anyway.

server_name extension:
This is a way to tell Amazon.com that our browser is trying to reach https://www.amazon.com/. This is really convenient because our TLS handshake occurs long before any HTTP traffic. HTTP has a “Host” header which allows a cost-cutting Internet hosting companies to pile hundreds of websites onto a single IP address. SSL has traditionally required a different IP for each site, but this extension allows the server to respond with the appropriate certificate that the browser is looking for. If nothing else, this extension should allow an extra week or so of IPv4 addresses.

Server Hello

Amazon.com replies with a handshake record that’s a massive two packets in size (2,551 bytes). The record has version bytes of 0x0301 meaning that Amazon agreed to our request to use TLS 1.0. This record has three sub-messages with some interesting data:

“Server Hello” Message (2):

We get the server’s four byte time Unix epoch time representation and its 28 random bytes that will be used later.

A 32 byte session ID in case we want to reconnect without a big handshake.

Of the 34 cipher suites we offered, Amazon picked “TLS_RSA_WITH_RC4_128_MD5” (0x0004). This means that it will use the “RSA” public key algorithm to verify certificate signatures and exchange keys, the RC4 encryption algorithm to encrypt data, and the MD5 hash function to verify the contents of messages. We’ll cover these in depth later on. I personally think Amazon had selfish reasons for choosing this cipher suite. Of the ones on the list, it was the one that was least CPU intensive to use so that Amazon could crowd more connections onto each of their servers. A much less likely possibility is that they wanted to pay special tribute to Ron Rivest, who created all three of these algorithms.

Certificate Message (11):

This message takes a whopping 2,464 bytes and is the certificate that the client can use to validate Amazon’s. It isn’t anything fancy. You can view most of its contents in your browser:

“Server Hello Done” Message (14):

This is a zero byte message that tells the client that it’s done with the “Hello” process and indicate that the server won’t be asking the client for a certificate.

Checking out the Certificate

The browser has to figure out if it should trust Amazon.com. In this case, it’s using certificates. It looks at Amazon’s certificate and sees that the current time is between the “not before” time of August 26th, 2008 and before the “not after” time of August 27, 2009. It also checks to make sure that the certificate’s public key is authorized for exchanging secret keys.

Why should we trust this certificate?

Attached to the certificate is a “signature” that is just a really long number in big-endian format:

Anyone could have sent us these bytes. Why should we trust this signature? To answer that question, need to make a speedy detour into mathemagic land:

You pick two huge prime numbers “p” and “q.” Multiply them to get “n = p*q.” Next, you pick a small public exponent “e” which is the “encryption exponent” and a specially crafted inverse of “e” called “d” as the “decryption exponent.” You then make “n” and “e” public and keep “d” as secret as you possibly can and then throw away “p” and “q” (or keep them as secret as “d”). It’s really important to remember that “e” and “d” are inverses of each other.

Now, if you have some message, you just need to interpret its bytes as a number “M.” If you want to “encrypt” a message to create a “ciphertext”, you’d calculate:

C ≡ Me (mod n)

This means that you multiply “M” by itself “e” times. The “mod n” means that we only take the remainder (e.g. “modulus”) when dividing by “n.” For example, 11 AM + 3 hours ≡ 2 (PM) (mod 12 hours). The recipient knows “d” which allows them to invert the message to recover the original message:

Cd ≡ (Me)d ≡ Me*d ≡ M1 ≡ M (mod n)

Just as interesting is that the person with “d” can “sign” a document by raising a message “M” to the “d” exponent:

Md ≡ S (mod n)

This works because “signer” makes public “S”, “M”, “e”, and “n.” Anyone can verify the signature “S” with a simple calculation:

Se ≡ (Md)e ≡ Md*e ≡ Me*d ≡ M1 ≡ M (mod n)

Public key cryptography algorithms like RSA are often called “asymmetric” algorithms because the encryption key (in our case, “e”) is not equal to (e.g. “symmetric” with) the decryption key “d”. Reducing everything “mod n” makes it impossible to use the easy techniques that we’re used to such as normal logarithms. The magic of RSA works because you can calculate/encrypt C ≡ Me (mod n) very quickly, but it is really hard to calculate/decrypt Cd ≡ M (mod n) without knowing “d.” As we saw earlier, “d” is derived from factoring “n” back to its “p” and “q”, which is a tough problem.

Verifying Signatures

The big thing to keep in mind with RSA in the real world is that all of the numbers involved have to be big to make things really hard to break using the best algorithms that we have. How big? Amazon.com’s certificate was “signed” by “VeriSign Class 3 Secure Server CA.” From the certificate, we see that this VeriSign modulus “n” is 2048 bits long which has this 617 digit base-10 representation:

(Good luck trying to find “p” and “q” from this “n” - if you could, you could generate real-looking VeriSign certificates.)

VeriSign’s “e” is 216 + 1 = 65537. Of course, they keep their “d” value secret, probably on a safe hardware device protected by retinal scanners and armed guards. Before signing, VeriSign checked the validity of the contents that Amazon.com claimed on its certificate using a real-world “handshake” that involved looking at several of their business documents. Once VeriSign was satisfied with the documents, they used the SHA-1 hash algorithm to get a hash value of the certificate that had all the claims. In Wireshark, the full certificate shows up as the “signedCertificate” part:

It’s sort of a misnomer since it actually means that those are the bytes that the signer is going to sign and not the bytes that already include a signature.

The actual signature, “S”, is simply called “encrypted” in Wireshark. If we raise “S” to VeriSign’s public “e” exponent of 65537 and then take the remainder when divided by the modulus “n”, we get this “decrypted” signature hex value:

Per the PKCS #1 v1.5 standard, the first byte is “00” and it “ensures that the encryption block, [when] converted to an integer, is less than the modulus.” The second byte of “01” indicates that this is a private key operation (e.g. it’s a signature). This is followed by a lot of “FF” bytes that are used to pad the result to make sure that it’s big enough. The padding is terminated by a “00” byte. It’s followed by “30 21 30 09 06 05 2B 0E 03 02 1A 05 00 04 14” which is the PKCS #1 v2.1 way of specifying the SHA-1 hash algorithm. The last 20 bytes are SHA-1 hash digest of the bytes in “signedCertificate.”

Since the decrypted value is properly formatted and the last bytes are the same hash value that we can calculate independently, we can assume that whoever knew “VeriSign Class 3 Secure Server CA”’s private key “signed” it. We implicitly trust that only VeriSign knows the private key “d.”

We can repeat the process to verify that “VeriSign Class 3 Secure Server CA”’s certificate was signed by VeriSign’s “Class 3 Public Primary Certification Authority.”

But why should we trust that? There are no more levels on the trust chain.

The top “VeriSign Class 3 Public Primary Certification Authority” was signed by itself. This certificate has been built into Mozilla products as an implicitly trusted good certificate since version 1.4 of certdata.txt in the Network Security Services (NSS) library. It was checked-in on September 6, 2000 by Netscape’s Robert Relyea with the following comment:

“Make the framework compile with the rest of NSS. Include a ‘live’ certdata.txt with those certs we have permission to push to open source (additional certs will be added as we get permission from the owners).”

This decision has had a relatively long impact since the certificate has a validity range of January 28, 1996 - August 1, 2028.

As Ken Thompson explained so well in his “Reflections on Trusting Trust”, you ultimately have to implicitly trust somebody. There is no way around this problem. In this case, we’re implicitly trusting that Robert Relyea made a good choice. We also hope that Mozilla’s built-in certificate policy is reasonable for the other built-in certificates.

One thing to keep in mind here is that all these certificates and signatures were simply used to form a trust chain. On the public Internet, VeriSign’s root certificate is implicitly trusted by Firefox long before you go to any website. In a company, you can create your own root certificate authority (CA) that you can install on everyone’s machine.

Alternatively, you can get around having to pay companies like VeriSign and avoid certificate trust chains altogether. Certificates are used to establish trust by using a trusted third-party (in this case, VeriSign). If you have a secure means of sharing a secret “key”, such as whispering a long password into someone’s ear, then you can use that pre-shared key (PSK) to establish trust. There are extensions to TLS to allow this, such as TLS-PSK, and my personal favorite, TLS with Secure Remote Password (SRP) extensions. Unfortunately, these extensions aren’t nearly as widely deployed and supported, so they’re usually not practical. Additionally, these alternatives impose a burden that we have to have some other secure means of communicating the secret that’s more cumbersome than what we’re trying to establish with TLS (otherwise, why wouldn’t we use that for everything?).

/* cert is OK. This is the client side of an SSL connection. * Now check the name field in the cert against the desired hostname. * NB: This is our only defense against Man-In-The-Middle (MITM) attacks! */

This check helps prevent against a man-in-the-middle attack because we are implicitly trusting that the people on the certificate trust chain wouldn’t do something bad, like sign a certificate claiming to be from Amazon.com unless it actually was Amazon.com. If an attacker is able to modify your DNS server by using a technique like DNS cache poisoning, you might be fooled into thinking you’re at a trusted site (like Amazon.com) because the address bar will look normal. This last check implicitly trusts certificate authorities to stop these bad things from happening.

Pre-Master Secret

We’ve verified some claims about Amazon.com and know its public encryption exponent “e” and modulus “n.” Anyone listening in on the traffic can know this as well (as evidenced because we are using Wireshark captures). Now we need to create a random secret key that an eavesdropper/attacker can’t figure out. This isn’t as easy as it sounds. In 1996, researchers figured out that Netscape Navigator 1.1 was using only three sources to seed their pseudo-random number generator (PRNG). The sources were: the time of day, the process id, and the parent process id. As the researchers showed, these “random” sources aren’t that random and were relatively easy to figure out.

Since everything else was derived from these three “random” sources, it was possible to “break” the SSL “security” in 25 seconds on a 1996 era machine. If you still don’t believe that finding randomness is hard, just ask the Debian OpenSSL maintainers. If you mess it up, all the security built on top of it is suspect.

The 48 byte “pre-master secret” random value that’s generated isn’t used directly, but it’s very important to keep it secret since a lot of things are derived from it. Not surprisingly, Firefox makes it hard to find out this value. I had to compile a debug version and set the SSLDEBUGFILE and SSLTRACE environment variables to see it.

In this particular session, the pre-master secret showed up in the SSLDEBUGFILE as:

Note that it’s not completely random. The first two bytes are, by convention, the TLS version (03 01).

Trading Secrets

We now need to get this secret value over to Amazon.com. By Amazon’s wishes of “TLS_RSA_WITH_RC4_128_MD5”, we will use RSA to do this. You could make your input message equal to just the 48 byte pre-master secret, but the Public Key Cryptography Standard (PKCS) #1, version 1.5 RFC tells us that we should pad these bytes with random data to make the input equal to exactly the size of the modulus (1024 bits/128 bytes). This makes it harder for an attacker to determine our pre-master secret. It also gives us one last chance to protect ourselves in case we did something really bone-headed, like reusing the same secret. If we reused the key, the eavesdropper would likely see a different value placed on the network due to the random padding.

Again, Firefox makes it hard to see these random values. I had to insert debugging statements into the padding function to see what was going on:

This is Firefox’s way of telling Amazon that it’s going to start using the agreed upon secret to encrypt its next message.

Deriving the Master Secret

If we’ve done everything correctly, both sides (and only those sides) now know the 48 byte (256 bit) pre-master secret. There’s a slight trust issue here from Amazon’s perspective: the pre-master secret just has bits that were generated by the client, they don’t take anything into account from the server or anything we said earlier. We’ll fix that be computing the “master secret.” Per the spec, this is done by calculating:

The “pre_master_secret” is the secret value we sent earlier. The “master secret” is simply a string whose ASCII bytes (e.g. “6d 61 73 74 65 72 …”) are used. We then concatenate the random values that were sent in the ClientHello and ServerHello (from Amazon) messages that we saw at the beginning.

The PRF is the “Pseudo-Random Function” that’s also defined in the spec and is quite clever. It combines the secret, the ASCII label, and the seed data we give it by using the keyed-Hash Message Authentication Code (HMAC) versions of both MD5 and SHA-1 hash functions. Half of the input is sent to each hash function. It’s clever because it is quite resistant to attack, even in the face of weaknesses in MD5and SHA-1. This process can feedback on itself and iterate forever to generate as many bytes as we need.

Since we’re using a stream cipher instead of a block cipher like the Advanced Encryption Standard (AES), we don’t need the Initialization Vectors (IVs). Therefore, we just need two Message Authentication Code (MAC) keys for each side that are 16 bytes (128 bits) each since the specified MD5 hash digest size is 16 bytes. In addition, the RC4 cipher uses a 16 byte (128 bit) key that both sides will need as well. All told, we need 216 + 216 = 64 bytes from the key block.

Prepare to be Encrypted!

The last handshake message the client sends out is the “Finished message.” This is a clever message that proves that no one tampered with the handshake and it proves that we know the key. The client takes all bytes from all handshake messages and puts them into a “handshake_messages” buffer. We then calculate 12 bytes of “verify_data” using the pseudo-random function (PRF) with our master key, the label “client finished”, and an MD5 and SHA-1 hash of “handshake_messages”:

We take the result and add a record header byte “0x14” to indicate “finished” and length bytes “00 00 0c” to indicate that we’re sending 12 bytes of verify data. Then, like all future encrypted messages, we need to make sure the decrypted contents haven’t been tampered with. Since our cipher suite in use is TLS_RSA_WITH_RC4_128_MD5, this means we use the MD5 hash function.

Some people get paranoid when they hear MD5 because it has some weaknesses. I certainly don’t advocate using it as-is. However, TLS is smart in that it doesn’t use MD5 directly, but rather the HMAC version of it. This means that instead of using MD5(m) directly, we calculate:

As you can see, we include a sequence number (“seq_num”) along with attributes of the plaintext message (here it’s called “TLSCompressed”). The sequence number foils attackers who might try to take a previously encrypted message and insert it midstream. If this occurred, the sequence numbers would definitely be different than what we expected. This also protects us from an attacker dropping a message.

All that’s left is to encrypt these bytes.

RC4 Encryption

Our negotiated cipher suite was TLS_RSA_WITH_RC4_128_MD5. This tells us that we need to use Ron’s Code #4 (RC4) to encrypt the traffic. Ron Rivest developed the RC4 algorithm to generate random bytes based on a 256 byte key. The algorithm is so simple you can actually memorize it in a few minutes.

RC4 begins by creating a 256-byte “S” byte array and populating it with 0 to 255. You then iterate over the array by mixing in bytes from the key. You do this to create a state machine that is used to generate “random” bytes. To generate a random byte, we shuffle around the “S” array.

Put graphically, it looks like this:

To encrypt a byte, we xor this pseudo-random byte with the byte we want to encrypt. Remember that xor’ing a bit with 1 causes it to flip. Since we’re generating random numbers, on average the xor will flip half of the bits. This random bit flipping is effectively how we encrypt data. As you can see, it’s not very complicated and thus it runs quickly. I think that’s why Amazon chose it.

Recall that we have a “client_write_key” and a “server_write_key.” The means we need to create two RC4 instances: one to encrypt what our browser sends and the other to decrypt what the server sent us.

The first few random bytes out of the “client_write” RC4 instance are “7E 20 7A 4D FE FB 78 A7 33 …” If we xor these bytes with the unencrypted header and verify message bytes of “14 00 00 0C 98 F0 AE CB C4 …”, we’ll get what appears in the encrypted portion that we can see in Wireshark:

The server does almost the same thing. It sends out a “Change Cipher Spec” and then a “Finished Message” that includes all handshake messages, including the decrypted version of the client’s “Finished Message.” Consequently, this proves to the client that the server was able to successfully decrypt our message.

Welcome to the Application Layer!

Now, 220 milliseconds after we started, we’re finally ready for the application layer. We can now send normal HTTP traffic that’ll be encrypted by the TLS layer with the RC4 write instance and decrypt traffic with the server RC4 write instance. In addition, the TLS layer will check each record for tampering by computing the HMAC_MD5 hash of the contents.

At this point, the handshake is over. Our TLS record’s content type is now 23 (0x17). Encrypted traffic begins with “17 03 01” which indicate the record type and TLS version. These bytes are followed by our encrypted size, which includes the HMAC hash.

which is a normal HTTP reply that includes a non-descriptive “Server: Server” header and a misspelled “Cneonction: close” header coming from Amazon’s load balancers.

TLS is just below the application layer. The HTTP server software can act as if it’s sending unencrypted traffic. The only change is that it writes to a library that does all the encryption. OpenSSL is a popular open-source library for TLS.

The connection will stay open while both sides send and receive encrypted data until either side sends out a “closure alert” message and then closes the connection. If we reconnect shortly after disconnecting, we can re-use the negotiated keys (if the server still has them cached) without using public key operations, otherwise we do a completely new full handshake.

It’s important to realize that application data records can be anything. The only reason “HTTPS” is special is because the web is so popular. There are lots of other TCP/IP based protocols that ride on top of TLS. For example, TLS is used by FTPS and secure extensions to SMTP. It’s certainly better to use TLS than inventing your own solution. Additionally, you’ll benefit from a protocol that has withstood careful security analysis.

… And We’re Done!

The very readable TLS RFC covers many more details that were missed here. We covered just one single path in our observation of the 220 millisecond dance between Firefox and Amazon’s server. Quite a bit of the process was affected by the TLS_RSA_WITH_RC4_128_MD5 Cipher Suite selection that Amazon made with its ServerHello message. It’s a reasonable choice that slightly favors speed over security.

As we saw, if someone could secretly factor Amazon’s “n” modulus into its respective “p” and “q”, they could effectively decrypt all “secure” traffic until Amazon changes their certificate. Amazon counter-balances this concern this with a short one year duration certificate:

One of the cipher suites that was offered was “TLS_DHE_RSA_WITH_AES_256_CBC_SHA” which uses the Diffie-Hellman key exchange that has a nice property of “forward secrecy.” This means that if someone cracked the mathematics of the key exchange, they’d be no better off to decrypt another session. One downside to this algorithm is that it requires more math with big numbers, and thus is a little more computationally taxing on a busy server. The “Advanced Encryption Standard” (AES) algorithm was present in many of the suites that we offered. It’s different than RC4 in that it works on 16 byte “blocks” at a time rather than a single byte. Since its key can be up to 256 bits, many consider this to be more secure than RC4.

In just 220 milliseconds, two endpoints on the Internet came together, provided enough credentials to trust each other, set up encryption algorithms, and started to send encrypted traffic.

And to think, all of this just so Bob can buy milk.

UPDATE: I wrote a program that walks through the handshake steps mentioned in this article. I posted it to GitHub.