The CRIME attack taught us that using compression can endanger confidentiality. In particular, it is dangerous to concatenate attacker-supplied data with sensitive secret data and then compress and encrypt the concatenation; any time we see that occurring, at any layer of the system stack, we should be suspicious of the potential for CRIME-like attacks.

Now the CRIME attack, at least as it has been publicly described so far, is an attack on TLS compression. Background: TLS includes a built-in compression mechanism, which happens at the TLS level (the entire connection is compressed). Thus, we have a situation where attacker-supplied data (e.g., the body of a POST request) gets mixed with secrets (e.g., cookies in the HTTP headers), which is what enabled the CRIME attack.

However there are also other layers of the system stack that may use compression. I am thinking especially of HTTP compression. The HTTP protocol has built-in support for compressing any resources that you download over HTTP. When HTTP compression is enabled, compression is applied to the body of the response (but not the headers). HTTP compression will be enabled only if both the browser and the server support it, but most browsers and many servers do, because it improves performance. Note that HTTP compression is a different mechanism from TLS compression; HTTP compression is negotiated at a higher level of the stack, and only applies to the body of the response. However, HTTP compression can be applied to data that is downloaded over a SSL/TLS connection, i.e., to resources downloaded via HTTPS.

My question: Is HTTP compression safe to use, on HTTPS resources? Do I need to do something special to disable HTTP compression of resources that are accessed over HTTPS? Or, if HTTP compression is somehow safe, why is it safe?

2 Answers
2

Compression, in general, alters the length of that which is compressed (that's exactly why we compress). Lossless compression alters the length depending on the data itself (whereas lossy compression can reach a fixed compression ratio, e.g. an MP3 file at a strict 128 kbit/s). Data length is what leaks through encryption, which is why we are interested in it.

In a very generic way, a length leak can be fatal, even in the presence of a passive-only attacker; it is a kind of traffic analysis. An example is from World War I, where French cryptographers could predict the importance of a message based on the length of the (encrypted) header: an important message was sent to the colonel (Oberst) whereas less important messages where tagged for a lieutenant (Oberleutnant, a much longer term).

Compression makes length leaks only worse, because it prevents you from fixing the length leaks by normalizing the lengths of the messages.

When the attacker can add some data of his own in the chunks which are compressed, he amplifies the length leak, which can be become a practical attack vector for arbitrary target data, as the CRIME attack demonstrates. However, I argue that the problem was already there. In that view, HTTP-level compression is not a new risk; it is rather an aggravating factor to a pre-existing risk. Letting the attacker add some of his own data in the encrypted stream is yet another aggravating factor, and these factors add up.

I wager that you are not the first one to have this idea. Not only did quite a lot of people (me included) gave some thought to it in the last 10 days, but if you try to access this URL:

http://www.google.com/sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa

then you get a 404 error from Google, which contains the "sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa" word. Hey, that's attacker chosen reflected data, that could be fun ! So let's try again with a HTTPS URL:

https://www.google.com/sdfdfskfdjsdfhfkjsbkfbsjksalakjsflfa

and then, no 404, no fun, you are unceremoniously redirected to Google's home page. This makes me thinks that some people at Google already thought of it, too, and proactively deactivated the reflection bit when using SSL (because when using SSL, you get the Google+ bells and whistles, hence potentially dangerous data).

Would adding a random number of padding bytes in the payload help? Assuming the padding bytes themselves were generated by a cryptographically secure RNG, such that they're unlikely to be compressed, I'd imagine it'd make the attack much harder to pull off reliably.
–
PolynomialSep 20 '12 at 9:48

2

@Polynomial: it can help a bit, but you have to add a substantial number of such bytes. If you add an average of n bytes (with a gaussian distribution), then the attacker must do about n^2 as many requests to cancel the effect of your padding. So we are talking about adding at least 1 kB of random padding -- it seems more efficient (for both CPU and network) to disable compression for non-static content.
–
Thomas PorninSep 20 '12 at 12:19

1

Interesting. 1kB seems a little excessive though - wouldn't 128 to 256 bytes do the job? That forces them to do ~65k times the requests, which is more than enough to make it difficult.
–
PolynomialSep 20 '12 at 12:21

2

@Polynomial: I don't trust numbers less than 1 million to be "high enough". We use computers with CPU in the gigahertz range, and networks which transmits several megabytes per second. Remember that the attacker can often afford to be patient.
–
Thomas PorninSep 20 '12 at 12:29

3

Good point, no idea why I didn't think of a patient attacker. I've got my "user-friendly" hat on today, rather than my "evil bastard" hat - it must be warping my judgement!
–
PolynomialSep 20 '12 at 12:52

It seems risky to me. HTTP compression is fine for static resources, but for some dynamic resources served over SSL, it seems like HTTP compression might be dangerous. It looks to me like HTTP compression can, in some circumstances, allow for CRIME-like attacks.

Consider a web application that has a dynamic page with the following characteristics:

It is served over HTTPS.

HTTP compression is supported by the server (this page will be sent to the browser in compressed form, if the browser supports HTTP compression).

The page has a CSRF token on it somewhere. The CSRF token is fixed for the lifetime of the session (say). This is the secret that the attack will try to learn.

The page contains some dynamic content that can be specified by the user. For simplicity, let us suppose that there is some URL parameter that is echoed directly into the page (perhaps with some HTML escaping applied to prevent XSS, but that is fine and will not deter the attack described).

Then I think CRIME-style attacks might allow an attacker to learn the CSRF token and mount CSRF attacks on the web site.

Let me give an example. Suppose the target web application is a banking website on www.bank.com, and the vulnerable page is https://www.bank.com/buggypage.html. Suppose the bank ensures that the banking stuff is only accessible by SSL (https). And, suppose that if the browser visits https://www.bank.com/buggypage.html?name=D.W., then the server will respond with a HTML document looking something vaguely like this:

Suppose you are browsing the web over an open Wifi connection, so that an attacker can eavesdrop on all of your network traffic. Suppose that you are currently logged into your bank, so your browser has an open session with your bank's website, but you're not actually doing any banking over the open Wifi connection. Suppose moreover that the attacker can lure you to visit the attacker's website http://www.evil.com/ (e.g., maybe by doing a man-in-the-middle attack on you and redirecting you when you try to visit some other http site).

Then, when your browser visits http://www.evil.com/, that page can trigger cross-domain requests to your bank's website, in an attempt to learn the secret CSRF token. Notice that Javascript is allowed to make cross-domain requests. The same-origin policy does prevent it from seeing the response to a cross-domain request. Nonetheless, since the attacker can eavesdrop on the network traffic, the attacker can observe the length of all encrypted packets and thus infer something about the length of the resources that are being downloaded over the SSL connection to your bank.

In particular, the malicious http://www.evil.com/ page can trigger a request to https://www.bank.com/buggypage.html?name=closeacct&csrftoken=1 and look at how well the resulting HTML page compresses (by eavesdropping on the packets and looking at the length of the SSL packet from the bank). Next, it can trigger a request to https://www.bank.com/buggypage.html?name=closeacct&csrftoken=2 and see how well the response compresses. And so on, for each possibility for the first digit of the CSRF token. One of those should compress a little bit better than the others: the one where the digit in the URL parameter matches the CSRF token in the page. This allows the attacker to learn the first digit of the CSRF token.

In this way, it appears that the attacker can learn each digit of the CSRF token, recovering them digit-by-digit, until the attacker learns the entire CSRF token. Then, once the attacker knows the CSRF token, he can have his malicious page on www.evil.com trigger a cross-domain request that contains the appropriate CSRF token -- successfully defeating the bank's CSRF protections.

It seems like this may allow an attacker to mount a successful CSRF attack on web applications, when the conditions above apply, if HTTP compression is enabled. The attack is possible because we are mixing secrets with attacker-controlled data into the same payload, and then compressing and encrypting that payload.

If there are other secrets that are stored in dynamic HTML, I could imagine that similar attacks might become possible to learn those secrets. This is just one example of the sort of attack I am thinking of. So, it seems to me that using HTTP compression on dynamic pages that are accessed over HTTPS is a bit risky. There might be good reasons to disable HTTP compression on all resources served over HTTPS, except for static pages/resources (e.g., CSS, Javascript).