Vim or Emacs?

A new attack on HTTPS was recently published by Angelo Prado, Neal Harris and Yoel Gluck. It created quite a stir and now there are calls to disable gzip compression in HTTP responses. On how practical the attack is, the authors say:

The BREACH attack can be exploited with just a few thousand requests, and can be executed in under a minute. The number of requests required will depend on the secret size. The power of the attack comes from the fact that it allows guessing a secret one character at a time.

The HTTP response must reflect user input (for example, outputting one of the GET parameters)

The page must contain a secret (otherwise, what motivation is there for the attack?)

The attacker must be able to:

View the victim's encrypted traffic

Send HTTP requests to the vulnerable web server on behalf of the victim

Item #2 can be accomplished by tricking the victim into viewing a website controlled by the attacker (which would contain a malicious iframe) or by modifying the content of any HTTP response not over SSL (the attacker could inject an iframe into the response directly).

If we assume the attacker has wedged himself snugly between the victim and the web app, it seems just about every web application is vulnerable. After all, what web app isn't (1) using gzip, (2) reflecting input, and (3) displaying secret data?

I've built a lot of web applications in my day, and after thinking about all those apps, I've come to the conclusion that conditions 2 and 3 are rarely met at the same time. Typically, the GET parameters reflected back to the user will be some kind of numeric ID, which is (hopefully) validated before the response is generated. When the range of values accepted by the GET parameter is so restricted, the attacker cannot craft the necessary string to extract part of the secret. When the range of values accepted by the GET parameter is large (like on a search page that accepts an arbitrary string), the HTTP response typically doesn't contain secrets (assuming the actual search results aren't secret, and the page isn't generating a CSRF token).

But user input can come from POST parameters too. So what about that? Let's assume the attacker can make arbitrary POST requests to the vulnerable web server on behalf of the victim. The web application should reject every POST request because the CSRF token is missing or invalid.

However, if the attacker can retrieve a valid CSRF token from a BREACH vulnerable page (like a search page), then he can make arability POST requests on the victim's behalf. At that point, the victim is in trouble.

In Django 1.5, a new ALLOWED_HOSTS setting was introduced. The documentation for it reads:

This is a security measure to prevent an attacker from poisoning caches and password reset emails with links to malicious hosts by submitting requests with a fake HTTP Host header, which is possible even under many seemingly-safe web server configurations.

It wasn't at all obvious to me how the attacks worked, so I did some searching and found a fantastic article explaining it: Practical HTTP Host header attacks. It was a real eye opener for me, especially the part about RFC-2616.

Drupal

As an academic exercise, I wanted to try this attack in the wild. Drupal, the CMS I love to hate, was an easy target.

Drupal, by default, naively uses the $_SERVER['HTTP_HOST'] variable to construct absolute URLs. Since this value is supplied by the client in an HTTP request, it cannot be trusted.

You can easily tell if a Drupal site is vulnerable by sending it a carefully constructed HTTP request with nc:

The HTTP request is sent to the server with an absolute RequestURI (which is unusual, but part of RFC-2616). The second line of the request contains a spoofed HTTP host header. If the HTTP response contains "foobar.example.com", the site can be attacked.

Intercepting the one-time login link email

When a user requests to reset their password, Drupal sends them an email containing a link back to the Drupal site, with a special token. The token allows the user to login once without a password. If an attacker can get the token, the attacker can login as the user. Drupal uses the $_SERVER['HTTP_HOST'] variable to construct the one-time login link. So it will happily construct, and send an email with a one-time reset link that looks like: evil.example.com/?q=user/reset/1/some-secure-token-abc123. If the unsuspecting user clicks the link, her account can be compromised immediately.

This attack requires the attacker to know a Drupal user's username or email address. Not hard stuff to get or guess.

The one-time login request form exists at /?q=user/password on every Drupal site I've seen (e.g. http://demo.opensourcecms.com/drupal/?q=user/password). All the attacker needs to do is send a valid HTTP POST request to the page, with a forged host header and voila, the user will be emailed a bad reset link.

To satisfy Drupal, the HTTP POST request needs a few special values, which you can get simply by viewing the source code on the reset form: form_id, form_build_id, and op. Here is example request:

If you send that request over port 80 to the server, it sends the admin user a one-time login link, that looks something like: http://hacked.example.com/drupal/?q=user/reset/1/1369016774/LOVL9KIJ4WSmRfHrmTWCkgT96qIJ0tKfIZjn1HGle_Y. If the user doesn't notice the incorrect URL, and clicks the link, the one-time login token will be intercepted at the attacker's website. The attacker can now login as the user.

Drupal's Response

Once I successfully used this attack on the latest version of Drupal, I emailed security@drupal.org. They responded (six days later) by creating this page: http://drupal.org/node/1992030. Since Drupal doesn't think it's a big enough problem to warrant a change to their default configuration, most Drupal sites will remain vulnerable.

I used PayPal for over 4 years before switching to Stripe. During those 4 years, with PayPal, I never had a charge disputed via the credit card company. I did, however, receive about a dozen disputes initiated through PayPal.

Every single one of those disputes I lost -- even ones where the customer realized the dispute was a mistake and tried to make amends. PayPal simply pretended to investigate the dispute, and after 30 days, returned the money to the customer. I don't think anyone at PayPal even glanced at my responses to the disputes. The last few disputes I received, I didn't even bother trying to defend myself. Fortunately, PayPal doesn't penalize you if you lose a dispute (besides refunding the money back to the customer).

Things changed when I switched to Stripe 5 months ago. They charge you $15 if you lose a dispute (originally, they would charge you $15 even if you won the dispute). Already, I have been the lucky recipient of 7 disputes. In 6 of the 7 disputes, the reason for the dispute has been "Unrecognized" or "Fraudulent" (which I have learned is pretty much the same thing as "Unrecognized").

I have one Stripe account under my company, Aptibyte, LLC. What the customer sees on their credit card statement is "aptibyte.com - ". They do not see the name of the product they purchased like "jeopardylabs.com" or "testmoz.com" . I do make it clear on the checkout pages and the receipt, that they are ordering through Aptibyte, LLC, and aptibyte.com will appear on their statement. But apparently, I have customers who ignore that, see aptibyte.com on their credit card statement, panic, and don't bother typing aptibyte.com into their browser (where they can see all my products). Instead they simply click the dispute payment link next to the charge on their credit statement. Awesome.

When I got my first dispute, I was not a happy camper, especially since Stripe was going to charge me $15 no matter what (this was before they refunded the $15 dispute fee). I sent some nastygram email to the customer (who never replied), deactivated their account, and submitted a very terse response to the dispute to Stripe. Needless to say, I didn't win the dispute.

For my next dispute, the customer actually called my business phone but they didn't leave a voicemail. Because I want to avoid talking to my customers at all costs, my voicemail greeting is something like "Please visit aptibyte.com for more information, or email support@aptibyte.com for help." Not exactly the most welcoming thing in the world, but I have a job, and can't man the phone. Needless to say, the next day, I received the dispute notice from Stripe. I sent a much more friendly email to the customer (who never responded), and wrote a thorough defense of the charge to Stripe. About a month later, I learned that I won the dispute. I felt good about winning, but I still didn't like the fact that the customer started the dispute in the first place after calling.

I changed my voicemail greeting to something nicer like "You reached Aptibyte.com the makers of web applications for teachers and other educational professionals. Please leave a message, and we will get back to you shortly" (or something like that). I have only received a few calls to that number, and I assume most of the callers are customers on the verge of starting a dispute. Fortunately, after the calls, I have never received a dispute notice the following day. This seems to indicate the voicemail helps reassure the customer that the charge is legit (seems way too obvious now).

So far, I have lost 2 of the 7 disputes, won 3, and have 2 under review (which I think I will win).

My next project idea is to implement an algorithm that builds word clouds. There are a bunch out there, from the (seemingly) simple, to the absolutely crazy amazing. This seems like a cool project for a few reasons:

Most word cloud builders use a third party applet thing (like Java, Flash or Silverlight). Since mine will only use HTML5, I will have a niche in the market (not that I ever intend to make money on this)

I have little experience rendering graphics on the client. I made Geothon with SVG, but I haven't used an HTML5 canvas yet.

The algorithm for finding a good layout for the words seems really challenging. The man behind Wordle described his algorithm in 3 lines. The devil is in the details.

Back to the point of this post...how do you find a bounding box for a character displayed on an HTML5 canvas element? It's absolutely necessary to know that when building a word cloud.

It might seem easy at first. There is a method you can call on the canvas context named measureText(). It returns the width of the string (or character) drawn with the current font. But the genius behind that method didn't think things all the way through. The method doesn't return the height of the text. Just the width. So you're on your own for that.

Getting an upper bound for the height, and the location of the font's baseline

You can use HTML to get an upper bound on the height of a character, and the vertical offset of the baseline. You create a span element (which I will call my_span), with the style attribute set to the exact font you want to measure (in this example, it is "72px 'Arial'"). Simply calling the my_span.getBoundingClientRect() method will get you an object with a height property. That tells us the upper bound on the height of the font. We will also need my_span.getBoundingClientRect().top to calculate the baseline...

To get the baseline of the font, create a div (which I will call my_div) and make it the sibling of the span you created earlier. Set the style attribute such that you align the element vertically with the baseline (which requires you to use display: inline-block). To calculate how far the baseline is from the top of the font's bounding box, just do my_div.getBoundingClientRect().top - my_span.getBoundingClientRect().top.

This snippet of HTML does the trick:

Closing in on the character

At this point, we know the maximum height of the character, and where the baseline is from the top and bottom of the bounding box. But we want a tighter bound. For example, an "o" is shorter than an "H", but the height we calculated earlier doesn't reflect that.

To get a tighter bound, we have to draw the character on a canvas element, loop through all the pixels on the canvas, and find the first, and last colored pixels. The function below does the job:

Finally, we get the height of a character. Putting it all together can draw bounding boxes around characters:

I started using Python regularly about 4 years ago, and I'm a huge fan. But even after 4 years, I find there is still a ton of stuff I don't know about the language. I was recently experimenting with Django's class based views. I decided I won't be using them for a lot of reasons. But I still wanted to know how they work.

Class based views make heavy use of multiple inheritance, which is something I've never had to use in any Python code I've ever written. Multiple inheritance is confusing from the programmer's perspective, and it's hard to implement correctly in a language. The BDFL, Guido van Rossum, himself couldn't get it right the first 2 times.

There is a ton of stuff written on Python multiple inheritance, and the method resolution order. The two pieces that helped me the most were:

Two months ago I finished my word search generation algorithm, and put off creating a website revolving around it. After building bingobaker, and crosswordlabs, I really did not want to build yet another simple word game website. All the fun is building the algorithm, not writing the CRUD interface.

But I spent a few hours over the last couple weekends, and built it: Word Search Labs. I put very little effort into it (I didn't even make a logo for it). I did, however, make the word searches solvable online. So when you view one, you can click and drag your mouse across the letters forming the word. Really exciting stuff...

I'm thinking my next project ought to be another iOS app. I paid $99 for a dev license from Apple, but since Scuttle Words is free, I haven't recouped my costs yet.

New HTTPS connections to my web applications would take upwards of 10 seconds, or completely timeout

If you could get connected to an application over SSL, all subsequent HTTPS requests would finish quickly

My apache logs, CPU and memory usage were all normal

Non-secure HTTP was fast

Using curl -v, I determined the SERVER HELLO message in the TLS handshake was responsible for the delay

The problem started in the late morning, and continued to the early afternoon, every weekday

Maybe someone more experienced could immediately identify the problem, but I had no idea what was going on. So I did what all desperate and incompetent system admins do: restart. I restarted Apache -- didn't fix it. I restarted the server -- didn't fix it.

After Googling my brains out, I thought maybe I didn't have enough entropy to generate the random numbers required for SSL. I checked my apache config, and found I was using /dev/urandom (which is good because it is non-blocking), and /proc/sys/kernel/random/entropy_avail always had around 150 bytes of entropy. Not the problem.

Then I thought maybe my SSL keys were messed up for some reason. I generated a new key, CSR, and got a new SSL certificate from my CA. After installing it, I restarted apache and...still no relief.

Next I thought the ciphers apache was using for SSL were too slow. So I bumped up the priority on the medium security ciphers, and still no fix.

The Fix

Finally, I learned about the apache2ctl status command. I used that, and determined that I had too many clients connecting to my server (an embarrassingly simple problem). By default, Apache only allows 150 connections to the server. I went into my apache config file, and bumped up the max connections to 256. After reloading apache, that fixed the problem, for a while. So when the problem started happening again, I bumped up MaxClients to 512.

When I reloaded apache, I didn't get the warning. But the new limit didn't seem to take effect. Ultimately, I had to completely stop the apache service, and then start it back up (for whatever reason, restart doesn't work).

My datacenter had a "short outage" on Friday. From about 8:45am to about 9:00pm, my websites were completely inaccessible. I looked through my apache logs and confirmed not a single request made it to my server during that time.

Needless to say, I was not a happy camper about it. But I only pay $70/month for the privilege of using their network, HVAC and power, so you can't expect the world from them.

What made me really angry was the way they communicated during the outage, or rather, the way they didn't communicate during the outage. When I recognized the outage, I emailed them. I checked on their website for some news, and didn't see anything. Their website was down on and off throughout the day (unlike mine, which didn't see the light of day for 12 hours). When they didn't respond via email, I attempted to call them. Busy signal. Awesome. I called several dozen times during the day (literally), and either got a busy signal, cut off, or stuck in a line that didn't move ("you are caller number 42").

At 7:36pm I finally got a generic email from them with a subject of "WORLDLINK Intermittent Network Issues". Yes, they consider 10+ hours of downtime (up to that point), an "intermittent" issue. Nothing pisses off your customers more than understating the significance of the problem they are facing. In a later email that evening, they say "We were unable to communicate as the networking issue affected both our phone system and email."

At around 8pm, I finally get through on the phone and talk to someone. I ask for an estimate on when service will be restored, and I don't get a firm answer. I ask, "Will network connectivity be restored before tomorrow morning?", and I get a cautious affirmative. Fortunately, an hour after this call, network connectivity to my server is restored.

How a Datacenter Should Handle "Issues"

Immediately post on a third party service (like Twitter) that you know about the issue.It tells the customer two important things: you know about the issue and there is nothing wrong with the customer's equipment. Use a third party service so even if your phones and email go down, you can still get a message out.

Give an ETA
The customer wants to know the scale of the issue, and when it can be fixed. The frustrating thing about this downtime was I had no idea when service would be restored. I have a backup web server and database slave in another datacenter that are ready to go. All I have to do is change my DNS. But I wasn't sure whether it was worth the trouble if service was going to be restored in an hour. Had I known it was going to be an all day thing, I would have switched over to the backup machine immediately.

Post updates
It's frustrating not knowing what is going on during the outage. It helps to know if the problem has been identified, whether some service has been restored, and if there is an updated ETA.

Notify us when the issue is supposedly resolved
Consider this scenario: Service has been restored; the datacenter has not notified the customer about the restoration; the customer's system is still inoperable (i.e. the datacenter issue had a side effect on the customer's system). How does the customer know the problem is with their own equipment, and not still with the datacenter?