The most interesting bit is not in the summary. Given individual websites they could identify which specific webpage one was visiting thus leaking with high probability all sorts of medical, financial and legal information. Examples used include from medicine the websites of the Mayo Clinic and Planned Parenthood, from finance Wells Fargo and Bank of America, and from entertainment Youtube and Netflix. This sort of thing could be used for all sorts of surveillance or blackmail. Even just knowing what Youtube videos one is watching could be used for such ends.

The "leaks" seem more like they can track the path of a user through a website, given the structure of the links and the relative size of the pages. I don't think they claimed they could tell what the data was on the page, but sometimes the fact that a user is on a given page is enough (depending on the structure of the site).

For youtube, they'd have to figure out the relative sizes of all the pages, which might be difficult to do (and the size will change depending on he comments and browser used).

Right. They first crawl the site to build a map of the encrypted pages. Then by looking at other encrypted streams, they can guess, with approx. 89% accuracy, what page it was. The overwhelming point here is that it is a complete and utter GUESS. Without decrypting the contents, they don't know for sure what it is. The issue for SSL is that it's not very good encryption if my https traffic for foo.html is sufficiently the same as another https session's traffic for foo.html -- i.e. it's failing the te

And do you think this is specific to HTTPS, or rather a problem with most encryption techniques as we use them (given that we're not zero-padding input data to make it all rougly the same size, that is, pretty indistinguishable)?

In this case, it's specific against SSL. But in general, this is another form of differential crypt-analysis. Any credible encryption system takes steps to prevent this. (simply put, a single bit change in either key or plaintext should not have an easily predictable effect on the ciphertext.) As far as I know, no one has tried this method on other crypto methods.

Size alone is a very weak means of mapping content. Almost every modern web application has some variability in the output size at any given URL

What we do, and have done for many years, is just pad to the nearest X bytes, where X is roughly size / 30. That's small enough that it makes little difference in speed, but many resources end up being the same size.

Consider as an example the Mayo clinic web site. Each page is maybe 5KB for the html itself. The graphics for the logo, nav bar, etc.are separate requests, cached after the home page. 80% of the html is template stuff - the header, the footer, the nav bar, overall page structure. Maybe 20%,

This might be another reason that one should consider using VPNs, even if on a trusted network. At least an attacker would be able to see traffic go by, but not know where it is going to, especially if there is a program in the background doing random HTTPS queries to various sites for noise.

Of course, the downside of VPNs is that a lot of them have their outgoing IP address flagged, so Google either demands a CAPTCHA before use, or just gives the middle finger and denies access entirely.

It is more like you need a TOR styled VPN, routing your traffic over different paths and aggregating/dis-aggregating so there is never a single point that all your traffic flows through. Not especially efficient.