The most unpalatable recommendation came from the official maintainers of Django, a popular Web framework that's perhaps second only to Ruby on Rails. In an advisory published Tuesday, they recommend website operators disable data compression in responses sent to end users. The compression, which is often considered crucial to conserve bandwidth and the time it takes browsers to load Web pages, may be turned off either by disabling Django's GZip middleware or by modifying configuration settings in the underlying Web server application.

"We plan to take steps to address BREACH in Django itself, but in the meantime we recommend that all users of Django understand this vulnerability and take action if appropriate," the advisory states.

A Ruby on Rails user, meanwhile, is recommending a radically new format for sending security tokens designed to prevent so-called cross site request forgery attacks. Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token. The recommendation, which was published over the weekend by Bradley Buda, implements a suggested approach for masking secret tokens included in the official BREACH whitepaper.

Short for Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext, BREACH exploits the standard deflate algorithm websites use when sending compressed pages. Attackers who are able to passively monitor the Web traffic and send modified requests on behalf of the victim can glean clues about the plain text included in the encrypted data streams. By making educated guesses, including them in requests sent to the Web server, and comparing the size of the compressed responses, they can extract encrypted secrets in as little as 30 seconds using a few thousand requests. US CERT, which is backed by the US Department of Homeland Security, warned about BREACH on Friday.

Promoted Comments

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

Yes, exactly.

By scrambling the key by XORing against random data, you make the data impossible to compress.

Anything XOR'd with random data will have almost exactly 50% of the bits inverted, and the other 50% left unchanged. Which 50% will be different every time, making it incompressible.

In other words, instead of disabling compression for the whole HTTP request, you only disable it for selected pieces of the page.

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

54 Reader Comments

Dan, all of your reporting on this issue has left out a crucial detail, which makes the attach seem much more practical than it actually is. For a BREACH attack to work, the body of the HTTP response must reflect extraneous input from the request, in addition to having the target string already in the body of the response.

In other words, to use BREACH to guess the string "email = foo@bar.com" in an account page:

1. "email = foo@bar.com" must be in the body of the HTTP response on the page normally.2. Adding an "email = ..." to the HTTP *request* MUST result in an ADDITIONAL instance of "email = ..." on the body of the HTTP *response*.

Unless both of those conditions are met, there is no attack. And #2 is not a very common occurance. Perhaps an excessively helpful login page which complains about extra query string args would be vulnerable. But for the vast majority of pages, this is not SSLmageddon.

Have any tools been released that allow us to test out sites to see if we are vulnerable? We have some web apps developed out of house, wondering if there is a way I can test them without becoming a part time web dev....

Dan, all of your reporting on this issue has left out a crucial detail, which makes the attach seem much more practical than it actually is. For a BREACH attack to work, the body of the HTTP response must reflect extraneous input from the request, in addition to having the target string already in the body of the response.

In other words, to use BREACH to guess the string "email = foo@bar.com" in an account page:

1. "email = foo@bar.com" must be in the body of the HTTP response on the page normally.2. Adding an "email = ..." to the HTTP *request* MUST result in an ADDITIONAL instance of "email = ..." on the body of the HTTP *response*.

Unless both of those conditions are met, there is no attack. And #2 is not a very common occurance. Perhaps an excessively helpful login page which complains about extra query string args would be vulnerable. But for the vast majority of pages, this is not SSLmageddon.

Lets say you want to get someone's facebook session cookie, you just need to make the victim's browser load a link like `https://www.facebook.com/search.php?q=stringthatdoesnotexistSESSIONCOOKIE` where SESSIONCOOKIE changes (using redirects from a malicious server if you want to make it go quicker) until you find one where the length is the same as SESSIONCOOKIE being blank.

Or at least that's my understanding.

EDIT: Wait a second, headers aren't part of the compression. This wouldn't work for a cookie. But the overall idea still stands.

EDIT2: Going with the email example, if you're on a web page that always has your email in one corner and you make them go to example.com/user@email.com resulting in a helpful, and importantly themed, 404 page saying "Sorry example.com/user@email.org Doesn't Exist" you now have a page that you can use.

This is the part I'm confused about... if cookies are passed in the clear all the time, isn't there a bigger worry at play that hasn't really materialized? Then again maybe that's the problem here, the web engines are not designed to change tokens even while in SSL mode, which obviously does need to be fixed.

EDIT: Wait a second, headers aren't part of the compression. This wouldn't work for a cookie. But the overall idea still stands.

Yes, now you see the problem. While headers are often reflected from request to response, bodies seldom are. There definitely exist pages which which are vulnerable to this, but it's a very specific subset. And it can be completely mitigated by disabling compression for those pages only.

EDIT2: Going with the email example, if you're on a web page that always has your email in one corner and you make them go to example.com/user@email.com resulting in a helpful, and importantly themed, 404 page saying "Sorry example.com/user@email.org Doesn't Exist" you now have a page that you can use.

Yes, that's exactly the kind of page/target string which would be vulnerable.

If the hack depends on the size of the compressed page being predictable, can't we simply insert a random string of a random length into the output of every page? The implementation time, processing overhead, and additional bandwidth usage would all be quite minimal.

If the hack depends on the size of the compressed page being predictable, can't we simply insert a random string of a random length into the output of every page? The implementation time, processing overhead, and additional bandwidth usage would all be quite minimal.

IIRC, the authors claimed to be able to defeat this with statistical analysis. It certainly adds noise to the signal, which would require even more intercepted ciphertexts and malicious requests.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

How come the author is ignoring the suggestions I made in the comment thread of the last article on this subject? Do people consider it pointless to merely slow down an attack by including random data of a random length in every HTTPS request to a user-login page, and by applying a rate-limiting scheme for login attempts on any given account? Sure, this enables a DDoS attack on specific known user accounts (deliberately exceed rate limit on known account to deny service to that user), but it does fix the problem being discussed without disabling compression on the entire site.

Why not disable compression only on the login page, use Javascript and server-side scripting to encrypt/pad-out all HTTP requests/responses to a specific fixed length, and prevent login by any other page except the one with these controls? I'm sure compression can be disabled for specific URLs, with Apache .htaccess files and similar...

EDIT2: Going with the email example, if you're on a web page that always has your email in one corner and you make them go to example.com/user@email.com resulting in a helpful, and importantly themed, 404 page saying "Sorry example.com/user@email.org Doesn't Exist" you now have a page that you can use.

Yes, that's exactly the kind of page/target string which would be vulnerable.

Thanks for this example. I can now see how that particular page would be vulnerable, but I fail to see how passwords are going to be retrieved this way. Further, like you said, it requires VERY special circumstances and is not nearly as widespread as the articles make it seem. (Which has been your point in this thread so far.) I'm agreeing with you more and more that this is not nearly as major as it's being made out to be.

It's a very interesting attack that could aid someone who really wanted to target a specific site and/or person, but I don't see it being used in script-kiddie apps anytime soon. You can't exactly point this attack at any ol' site and let it rip.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

I doubt any common, fast compression algorithm would even attempt such a thing, since you'd have an absurd number of possibilities you'd have to do. You'd have to XOR every possible substring with every other possible substring. Massive CPU cost, for likely very little real-world performance gains. Generally you want a compression algorithm that's rather fast, not the absolute most compression.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

Yes, exactly.

By scrambling the key by XORing against random data, you make the data impossible to compress.

Anything XOR'd with random data will have almost exactly 50% of the bits inverted, and the other 50% left unchanged. Which 50% will be different every time, making it incompressible.

In other words, instead of disabling compression for the whole HTTP request, you only disable it for selected pieces of the page.

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

In other words, instead of disabling compression for the whole HTTP request, you only disable it for selected pieces of the page.

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

I would tend to agree with your position here: I'm not certain I agree with KitsuneKnight's apparent suggestion that the only way to check this XOR'd data for compressibility would be to XOR every possible substring with every other possible substring! I speculate that it might be possible to use some more advanced pattern-matching or wave-analysis technique that could unpick the relationship between different patterns (especially easily, in the case where half of your password comprises empty space). So it might be possible in the future that encryption schemes (particularly for long-distance low-bandwidth connections) might actually compress this sort of data. So by implementing this solution, one still has to worry that future software updates will break your security by making you vulnerable to this attack again. Hence my original suggestion...

Quote:

Why not disable compression only on the login page, use Javascript and server-side scripting to encrypt/pad-out all HTTP requests/responses to a specific fixed length, and prevent login by any other page except the one with these controls? I'm sure compression can be disabled for specific URLs, with Apache .htaccess files and similar...

For now though, having understood the points made by the other commenters, I'm going to be implementing this XOR idea... (My web host is running NginX alongside Apache, and I want to make totally sure the HTTPd doesn't ignore my .htaccess instructions and do compression anyway in a way that breaks security.) I'm also going to be padding out passwords with random quantities of random data, and applying some not-too-aggressive rate-limiting schemes.

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

I agree with everything except your last statement, which would be better put (IMO) as "There's something wrong with your website". (EDIT: who wants a business model that mandates a slow website or only customers with fibre?)

Many sites can live with the risk of a BREACH attack for now. (shrug). I keep my cash in a wallet. The banks keep it in a vault. It's all risk assessment: losing $50 vs losing $50M.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

Interestingly enough, the Websockets protocol does exactly this with all data sent from client to server. For each message, a random 4-byte value is generated, prepended to the message, and then XOR'ed together with the message data itself.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

Yes, exactly.

By scrambling the key by XORing against random data, you make the data impossible to compress.

Anything XOR'd with random data will have almost exactly 50% of the bits inverted, and the other 50% left unchanged. Which 50% will be different every time, making it incompressible.

In other words, instead of disabling compression for the whole HTTP request, you only disable it for selected pieces of the page.

No sorry, that is wrong. In general compression oracles (and this hardly the 1st compression oracle attack on TLS) compare the "secret" to an attacker controlled string. So it WILL compress since on a correct guess the attacker controlled string will match perfectly the attacker just needs to guess more bytes (the target string + its key) but that is a linear time increase not polynomial or exponential time. Even if that idea fixes THIS attack it doesn't fix the others nor any future attacks. The only way to defeat compression oracles is not to use compression. Any other fix is either a single attack vector "patch" or will offer no more security than that approach would offer without the external TLS (or other) encryption layer.

Quote:

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model

Exactly, turn it off for sensitive pages then we don't need to all do case by case security analysis to check each website we wish to deploy against all current attacks only to have to repeat that process for each new attack and end up forever chasing our tails.

The one-time-pad fix is kind of what I've been proposing in the other articles, but one that can actually be extended without preventing compression.

Basically instead of using it for encryption, you use it for translation, with the aim being that when you XOR identical text you get an identical result, so compression still works but malicious guesses won't do anything. Kind of like old written ciphers that simply translated individual letters using a table; these did nothing to hide repetition so compression would still work.

In a computing sense you'd want something longer than a single-character key since most sites use UTF-8 and one byte isn't going to add much security. But in my posts I've proposed using UTF-32 encoding for text, and translating in four code-point chunks (so 128-bits of security). The key you use should be generated for each new request, and can simply be added at the start of the encrypted message; so long as the encryption itself hasn't been broken this is perfectly safe, and if it has been broken then you have bigger issues to worry about anyway

Since it's unlikely anyone cares if an attacker guesses at structural elements (which are probably available to them anyway) it's possible to instead use Javascript to apply this technique only to the actual secure data that you're sending/receiving, although if this idea were adopted as part of TLS it would avoid the need to do this yourself.

If HTML is too expensive to serve, then there is something wrong with your business model.

It's not just HTML, nor a specific login page. This applies to security tokens often sent as cookies, in the request location, or in other request headers, on every single request.

Shutting off compression is the obviously correct, immediate answer to a compression oracle attack, I agree. But you can't just ignore the benefits of compression altogether, with a blanket statement that we're all doing it wrong.

No sorry, that is wrong. In general compression oracles (and this hardly the 1st compression oracle attack on TLS) compare the "secret" to an attacker controlled string. So it WILL compress since on a correct guess the attacker controlled string will match perfectly the attacker just needs to guess more bytes (the target string + its key) but that is a linear time increase not polynomial or exponential time.

In the one time pad idea you would have to successfully guess both the pad and the target string. Since each sniffed response would use a different pad, you would have guess them successfully *at the same time*. You could also not take advantage of things you know about the target secret (like structure of an email address) when making a guess at the pad.

AFAICT, that means it's more than just a linear increase: n ** m -> n ** (2m) even when the secret is random.

Say you are looking for a random byte. Instead of 256 guesses on average to hit it, you need to guess both the actual byte and the pad byte (and XOR them). If you only get the pad correctly, you don't get any information about the secret. If you only get the (pad XOR secret) byte correctly, you still don't know the actual pad that you'd need to extract the secret.

This part I'm not so sure about, but doesn't such a pad also prevent a byte-wise approach to guessing (which I imagine the 30s number depends on)? Even if you know know the first byte of the secret, you'd have to guess the two first bytes of a new pad to reduce the compressed length further and get the second byte of secret. Otherwise you have no way to tell which part of the pad or secret you've guessed.

(Edit: This all of course depends on XOR foiling the compression, but that's true at least for gzip and deflate which are the most common.)

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

Yes, exactly.

By scrambling the key by XORing against random data, you make the data impossible to compress.

Anything XOR'd with random data will have almost exactly 50% of the bits inverted, and the other 50% left unchanged. Which 50% will be different every time, making it incompressible.

In other words, instead of disabling compression for the whole HTTP request, you only disable it for selected pieces of the page.

No sorry, that is wrong. In general compression oracles (and this hardly the 1st compression oracle attack on TLS) compare the "secret" to an attacker controlled string. So it WILL compress since on a correct guess the attacker controlled string will match perfectly the attacker just needs to guess more bytes (the target string + its key) but that is a linear time increase not polynomial or exponential time. Even if that idea fixes THIS attack it doesn't fix the others nor any future attacks. The only way to defeat compression oracles is not to use compression. Any other fix is either a single attack vector "patch" or will offer no more security than that approach would offer without the external TLS (or other) encryption layer.

It WILL compress on correct guess, but isn't the point of one-time pad that correct guess is each time different?

Attacks of this kind rely on getting consistent response from oracle, and if the oracle's going to unpredictably change target string each time it won't help.

No sorry, that is wrong. In general compression oracles (and this hardly the 1st compression oracle attack on TLS) compare the "secret" to an attacker controlled string. So it WILL compress since on a correct guess the attacker controlled string will match perfectly the attacker just needs to guess more bytes (the target string + its key) but that is a linear time increase not polynomial or exponential time. Even if that idea fixes THIS attack it doesn't fix the others nor any future attacks. The only way to defeat compression oracles is not to use compression. Any other fix is either a single attack vector "patch" or will offer no more security than that approach would offer without the external TLS (or other) encryption layer./quote]

It WILL compress on correct guess, but isn't the point of one-time pad that correct guess is each time different?

Attacks of this kind rely on getting consistent response from oracle, and if the oracle's going to unpredictably change target string each time it won't help.

On one hand you say that this per-request scrambling of sensitive data defeats *this* attack but not future ones.

On the other hand, you state that this entire type of attacks relies on getting consistent/reproducible responses from the server (so you can make repeated requests and gradually guess more and more of the response text), and if the server XOR's the data with a newly generated mask for every response, then the response will not be consistent, and this entire type of attacks are effectively foiled.

If HTML is too expensive to serve, then there is something wrong with your business model.

It's not just HTML, nor a specific login page. This applies to security tokens often sent as cookies, in the request location, or in other request headers, on every single request.

Shutting off compression is the obviously correct, immediate answer to a compression oracle attack, I agree. But you can't just ignore the benefits of compression altogether, with a blanket statement that we're all doing it wrong.

Unfortunately, compression in fundamentally incompatible with encryption for interactive protocols you simply cannot securely compress sensitive information. To a compression oracle the choice of encryption matters not since it attacks the very process of removing redundancy to reduce the transmission size. It cannot be universally defended against only fixed ratio compression (e.g. bit 6/7bit ->8bit compaction) techniques can be safely used. By all means compress other page data, structural HTML, javascript whatever but if you compress secret data then you WILL be vulnerable, if not to crime, or to breach (crime v2) then to crime v3 and on. Doing things like XORing with keys achieves nothing if that key is sent along with the data as you can use the oracle to extract the key as well.

I'm contradicting my fumbled quote of the post I was replying to, sorry.

Not to waste a post, here's what compression oracle attacks about:

So, we have a black box that we can send requests to and receive encrypted replies from - "oracle".

Assume it replies with encrypted string "1234(text of request)" to queries and we want to find the "1234" part. Thanks to compression, when we send "5678" we get reply of length 8 - "12345678", but if we guess, say, "123" we might get a shorter reply "1234(repeat 3 bytes from 0)". So, observing length changes we can move closer and closer to result.

And, of course, we can't guess when reply is every time different - ergo, one-time pad.

But the other thing to consider is when you're incompressibly doubling the length of many parts on the page to avoid this, why not just drop compression for those pages at all?

No sorry, that is wrong. In general compression oracles (and this hardly the 1st compression oracle attack on TLS) compare the "secret" to an attacker controlled string. So it WILL compress since on a correct guess the attacker controlled string will match perfectly the attacker just needs to guess more bytes (the target string + its key) but that is a linear time increase not polynomial or exponential time. Even if that idea fixes THIS attack it doesn't fix the others nor any future attacks. The only way to defeat compression oracles is not to use compression. Any other fix is either a single attack vector "patch" or will offer no more security than that approach would offer without the external TLS (or other) encryption layer./quote]

It WILL compress on correct guess, but isn't the point of one-time pad that correct guess is each time different?

Attacks of this kind rely on getting consistent response from oracle, and if the oracle's going to unpredictably change target string each time it won't help.

On one hand you say that this per-request scrambling of sensitive data defeats *this* attack but not future ones.

On the other hand, you state that this entire type of attacks relies on getting consistent/reproducible responses from the server (so you can make repeated requests and gradually guess more and more of the response text), and if the server XOR's the data with a newly generated mask for every response, then the response will not be consistent, and this entire type of attacks are effectively foiled.

I'm contradicting my fumbled quote of the post I was replying to, sorry.

Not to waste a post, here's what compression oracle attacks about:

So, we have a black box that we can send requests to and receive encrypted replies from - "oracle".

Assume it replies with encrypted string "1234(text of request)" to queries and we want to find the "1234" part. Thanks to compression, when we send "5678" we get reply of length 8 - "12345678", but if we guess, say, "123" we might get a shorter reply "1234(repeat 3 bytes from 0)". So, observing length changes we can move closer and closer to result.

And, of course, we can't guess when reply is every time different - ergo, one-time pad.

But the other thing to consider is when you're incompressibly doubling the length of many parts on the page to avoid this, why not just drop compression for those pages at all?

I didn't say it did/didn't defeat this attack. The problem with this approach is you: a) need to fix every web application to actually use it (it is not transparent).b) as a server admin confirm it actually is used correctlyc) confirm the pad actually changes per requestd) where do you get the one time pad from, if it is from javascript then you have a problem as many javsacript implementations do not have good cryptographically secure RNGs anyway.e) can the client be fooled into sending the same page response multiple times in which case you have pad reuse. It is the client the attacker is interacting with not the server after all.

Only if you can confirm a-e above in all cases of client/server/web application can you say you have fixed it or you could just turn off compression? Also you must do this for every sensitive field so doubling their size. It is a potential fix, but a very fragile one considering how many problems we are having with TLS security is it a good idea to add a fragile obfuscation layer on top of it?

I'm contradicting my fumbled quote of the post I was replying to, sorry.

Not to waste a post, here's what compression oracle attacks about:

So, we have a black box that we can send requests to and receive encrypted replies from - "oracle".

Assume it replies with encrypted string "1234(text of request)" to queries and we want to find the "1234" part. Thanks to compression, when we send "5678" we get reply of length 8 - "12345678", but if we guess, say, "123" we might get a shorter reply "1234(repeat 3 bytes from 0)". So, observing length changes we can move closer and closer to result.

And, of course, we can't guess when reply is every time different - ergo, one-time pad.

But the other thing to consider is when you're incompressibly doubling the length of many parts on the page to avoid this, why not just drop compression for those pages at all?

I didn't say it did/didn't defeat this attack. The problem with this approach is you: a) need to fix every web application to actually use it (it is not transparent).b) as a server admin confirm it actually is used correctlyc) confirm the pad actually changes per requestd) where do you get the one time pad from, if it is from javascript then you have a problem as many javsacript implementations do not have good cryptographically secure RNGs anyway.e) can the client be fooled into sending the same page response multiple times in which case you have pad reuse. It is the client the attacker is interacting with not the server after all.

Only if you can confirm a-e above in all cases of client/server/web application can you say you have fixed it or you could just turn off compression? Also you must do this for every sensitive field so doubling their size. It is a potential fix, but a very fragile one considering how many problems we are having with TLS security is it a good idea to add a fragile obfuscation layer on top of it?

Only (a) is of real concern, and (e) is factually incorrect.

BREACH is man-in-the-middle attack and it listens to _server_ responses.

Attacker needs to first eavesdrop on the victim's connection and then make victim repeatedly send requests to target site, measuring length of server responses. Client requests can't be cracked that way, as they are not compressed.

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

I agree with everything except your last statement, which would be better put (IMO) as "There's something wrong with your website". (EDIT: who wants a business model that mandates a slow website or only customers with fibre?)

I explained how compressing HTML has no impact on page load performance in an article on the same topic a few days ago.

Web browsers start a 20 millisecond timer the instant they start receiving HTML from the server. Once that 20 millisecond timer has finished, they render the page. Regardless of whether you are using compression or not using compression, 20 milliseconds is long enough to download the first screen-full of HTML and therefore the user will not be able to distinguish between your page with compression and without compression.

It has no impact on performance at all. It simply reduces network costs. But it doesn't reduce costs by much because the biggest offender to bandwidth bills is always image and video content which cannot be compressed anyway via gzip.

Compression is valuable for javascript and css, but almost useless with HTML. Now that the BEAST attack is here, I think a well configured server should disable compression for HTML content. Even on a mediocre broadband connection, 20 milliseconds is more than enough to download a significant amount of the page.

If HTML is too expensive to serve, then there is something wrong with your business model.

It's not just HTML, nor a specific login page. This applies to security tokens often sent as cookies, in the request location, or in other request headers, on every single request.

Shutting off compression is the obviously correct, immediate answer to a compression oracle attack, I agree. But you can't just ignore the benefits of compression altogether, with a blanket statement that we're all doing it wrong.

Cookies are never compressed in HTTP, so they are not vulnerable to this attack. Google experimented with compressing them in Chrome but quickly backed down when they realised how bad it can be for security.

This is a good thing for my websites, because basically the only sensitive data we predictably send is the user's session cookie.

BREACH is just 1 HTTPS attack. Where there is one, there are likely to be others. Plus I don't trust **every** DNS provider and I certainly don't trust **every** CA in the world.

The only way to combat this and future breaches is to not trust HTTPS. Use a real VPN, like IPSec or OpenVPN - anything but pptp. Of course, this means you have to know your users - so it isn't likely to work for Amazon.

I've seen some online places use C/S certs for high value data - banks usually. These were a hassle to get working and how to track them, since end-users switch devices constantly to the point that providing certs and help for them is probably as much effort as all the sales transactions.

Yep - OpenVPN.

Lovin' the down-votes. But considering the alternatives ... OpenVPN is the best alternative. HTTPS is broken, not just for BREACH. There ARE other attack vectors too. It is a house of cards.

20 milliseconds is long enough to download the first screen-full of HTML and therefore the user will not be able to distinguish between your page with compression and without compression.[...]Even on a mediocre broadband connection, 20 milliseconds is more than enough to download a significant amount of the page.

A quite regular web page these days can be 100-200KB of HTML, and often you need to get to the end (or half way through to where another column starts) before you can lay it out correctly. It takes a second or even more for such a page to finish downloading on a 1-2Mbps mobile connection. 20ms wouldn't get you through to Ars' article header on such a connection.

Halving a second's latency is important, and typically HTML compresses down to much less than half – 25%, maybe.

(Of course, the browser may well get a larger part of the web page in 20ms due to all the buffering that happens on hardware and OS levels. However, the page size still adds to overall latency.)

20 milliseconds is long enough to download the first screen-full of HTML and therefore the user will not be able to distinguish between your page with compression and without compression.[...]Even on a mediocre broadband connection, 20 milliseconds is more than enough to download a significant amount of the page.

A quite regular web page these days can be 100-200KB of HTML, and often you need to get to the end (or half way through to where another column starts) before you can lay it out correctly. It takes a second or even more for such a page to finish downloading on a 1-2Mbps mobile connection. 20ms wouldn't get you through to Ars' article header on such a connection.

Halving a second's latency is important, and typically HTML compresses down to much less than half – 25%, maybe.

(Of course, the browser may well get a larger part of the web page in 20ms due to all the buffering that happens on hardware and OS levels. However, the page size still adds to overall latency.)

Sure, a badly designed webpage will load quicker with compression enabled. But if your website is badly designed, then you should just fix the design instead of relying on compression to hide the problem.

If I was the webmaster at ars, the first screen of content would be well inside the first 20 milliseconds and there would never be any "re-layout" of the page as more content is downloaded. It would only be a day or so of work to make that happen without any visual changes. It would just load much much faster.

But it's irrelevant anyway, since ars doesn't even use SSL. This whole discussion really only applies to websites where security is a priority. Those web pages tend to be much simpler, perhaps with a page almost entirely made up of content and very little else.

This is the part I'm confused about... if cookies are passed in the clear all the time, isn't there a bigger worry at play that hasn't really materialized? Then again maybe that's the problem here, the web engines are not designed to change tokens even while in SSL mode, which obviously does need to be fixed.

Everything in HTTP is passed around in the clear, that's why sensitive sites use SSL, which encrypts the entire communication channel, including the headers.

The issue here is that webservers typically compress the non-header parts of the http data, and this leaves open the attack vector being discussed.

KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

Ahh, good old Ars, where everyone who runs a blog getting 3 hits a day thinks they're an expert.

If you have ever managed a site which does multiple 10s of Gb/s of traffic at peak, you'd know that bandwidth is far from cheap. For significant volume websites - most of which don't operate in the rarefied, unlimited budget world of "I Can't Believe It's Not A Bubble" Google & Facebook - the cost of bandwidth becomes both opex and capital intensive. The physical limits of network equipment and infrastructure come into play, so scaling up bandwidth involves rather more investment than just ringing up your provider and asking for them to raise your cap.

Of course, most of us who end up working at that scale end up in the arms of people like Akamai, but I can assure you they do not consider bandwidth to be free. Of course, using the likes of Akamai brings whole new factors into play if you are insisting on HTTPS, like the fact you have to share your certificates with third parties - rendering half the point of HTTPS redundant...

Which is, of course, why (for the umpteenth time) the HTTPS-Everywhere advocates are deeply muddled and misguided.

And yes, I know this will be downvoted, because HTTPS-Everywhere is a mighty shibboleth on Ars these days; I've come to the conclusion though that this is to be expected - the average Ars reader's experience of real Internet operations at scale is much as the average DIY enthusiast's experience of skyscraper design and construction.

20 milliseconds is long enough to download the first screen-full of HTML and therefore the user will not be able to distinguish between your page with compression and without compression.[...]Even on a mediocre broadband connection, 20 milliseconds is more than enough to download a significant amount of the page.

A quite regular web page these days can be 100-200KB of HTML, and often you need to get to the end (or half way through to where another column starts) before you can lay it out correctly. It takes a second or even more for such a page to finish downloading on a 1-2Mbps mobile connection. 20ms wouldn't get you through to Ars' article header on such a connection.

Halving a second's latency is important, and typically HTML compresses down to much less than half – 25%, maybe.

(Of course, the browser may well get a larger part of the web page in 20ms due to all the buffering that happens on hardware and OS levels. However, the page size still adds to overall latency.)

Sure, a badly designed webpage will load quicker with compression enabled. But if your website is badly designed, then you should just fix the design instead of relying on compression to hide the problem.

If I was the webmaster at ars, the first screen of content would be well inside the first 20 milliseconds and there would never be any "re-layout" of the page as more content is downloaded. It would only be a day or so of work to make that happen without any visual changes. It would just load much much faster.

But it's irrelevant anyway, since ars doesn't even use SSL. This whole discussion really only applies to websites where security is a priority. Those web pages tend to be much simpler, perhaps with a page almost entirely made up of content and very little else.

This has nothing to do with well/badly designed pages. The latency improvements are noticeably significant because for one thing the CPU can compress/decompress faster than it will take to transfer the data, even on a fat link with low latency, owing to the simple fact that even a fat link (high bandwidth) and low latency is still only able to transfer data sequentially. Add queuing effects and TCP acknowledgements and congestion control, and it always makes sense to transmit the least amount of data possible.

Sure, disabling compression is not going to cause anyone to go broke on your 10-hits a day blog, but the costs are significant if your site pushes a great deal of traffic.

For me, this attack is not sufficiently practical as to need wide-scale disabling of compression. My bank maybe, and certain sensitive government websites... but for most other sites, I won't care too much.

KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

Ahh, good old Ars, where everyone who runs a blog getting 3 hits a day thinks they're an expert.

If you have ever managed a site which does multiple 10s of Gb/s of traffic at peak, you'd know that bandwidth is far from cheap. For significant volume websites - most of which don't operate in the rarefied, unlimited budget world of "I Can't Believe It's Not A Bubble" Google & Facebook - the cost of bandwidth becomes both opex and capital intensive. The physical limits of network equipment and infrastructure come into play, so scaling up bandwidth involves rather more investment than just ringing up your provider and asking for them to raise your cap.

Of course, most of us who end up working at that scale end up in the arms of people like Akamai, but I can assure you they do not consider bandwidth to be free. Of course, using the likes of Akamai brings whole new factors into play if you are insisting on HTTPS, like the fact you have to share your certificates with third parties - rendering half the point of HTTPS redundant...

Which is, of course, why (for the umpteenth time) the HTTPS-Everywhere advocates are deeply muddled and misguided.

And yes, I know this will be downvoted, because HTTPS-Everywhere is a mighty shibboleth on Ars these days; I've come to the conclusion though that this is to be expected - the average Ars reader's experience of real Internet operations at scale is much as the average DIY enthusiast's experience of skyscraper design and construction.

Why complain about down-votes after you make a good point? You just give justification for the down-votes when you do this.

Instead of delivering the credential as a 32-byte string, it should be delivered as a 64-byte string. The first 32 bytes are a one-time pad, and the second 32 bytes are encoded using the XOR algorithm between the pad and the "real" token.

—Pure nonsense. You can't protect the real token in this way at all. If the first 32 bytes (which you are communicating in the same string or via the same channel) are the key to the second 32 bytes, you can trivially XOR them again with the second 32 bytes and the result is the "real" token!

True, if the attacker has access to the plaintext, this would afford no security. But this data (both the token and the one-time pad) are encrypted. This attack works by examining how well data compresses. The point of the XOR-ing step is to prevent the attacker's guesses at the plaintext from correlating with how well it compresses.

So I guess this method relies on the compression software not being smart enough to reverse the XOR process and compress the compressible results (or in other words, not being smart enough to notice that the message is compressible if you XOR the two halves together).

Yes, exactly.

By scrambling the key by XORing against random data, you make the data impossible to compress.

Anything XOR'd with random data will have almost exactly 50% of the bits inverted, and the other 50% left unchanged. Which 50% will be different every time, making it incompressible.

In other words, instead of disabling compression for the whole HTTP request, you only disable it for selected pieces of the page.

It's a good technique, but I think it's better to just turn compression off. KISS is a good strategy for security, and bandwidth is cheap. If HTML is too expensive to serve, then there is something wrong with your business model.

Who says HTML is the only thing a website might have to serve? You could have PDF files, images, documents, binary data, even streamed video that might need to be sent encrypted. It all depends on what your business does.

Cookies are in the header which BREACH cannot touch, they are also encrypted because SSL encrypts their entire message. Don't most websites send their security tokens in the cookie/header? Wouldn't that mean that few websites are actually vulnerable to this?