During Sunday, US time, prominent Web services outfit CloudFlare sent an instruction to its routers in response to an attempted DoS, and instead took down its own network.
In a rare example of detailed disclosure, the company has posted an explanation of what happened here.
The network collapse occurred, the company explains …

Re: test?

Except this company sell a CDN product that is supposed to relieve stress on servers when they are under DoS and provide (and I quote) "Always Online™" and "Rock solid reliability" so that even if your server goes down, your visitors can still see your content.

So it's a bit embarrassing to not test, to just roll out, and not have an adequate testing procedure (I mean, rolling it out to all your routers before you notice is a bit stupid, no matter what).

And I can attest that at least one site I'm aware of was down for quite a long time despite the fact that it uses CloudFlare CDN to keep itself online "no matter what" and was returning all sorts of errors even though the underlying origin servers were up. Next time, their accountants will be telling them to test before they deploy, I think.

Re: Bill Re: test?

".......maybe they didn’t have time to go through full testing?......" I've seen similar mistakes, usually they are a combination of management pressure - "fix that NOW" - and over-confidence in one's own ability. Many, many moons ago, there was a rumour of a ping of death for CISCO Catalyst routers (5000 models IIRC) and much argument amongst netties as to whether it would work or not. At company I was working for at the time, our network architect, having the authority to do as he pleased, was firmly in the "it-won't-work" camp and decided to test it against one of our routers, only to find not only did it work but it also propagated through all the same models in the network. Cue embarrassing and company-wide network outage which we definitely did not step up and explain to the customers!

I think someone meant to do that.

Chris, I'll make you a bet, the packets weren't really "between 99,971 and 99,985 bytes long", they just had header fields saying they were, they sort of say as much when they say no packet should have matched the rule because no packets were actually that long, and that range of lengths was picked because the attacker knew a rule blocking them would crash the routers badly.

Re: I think someone meant to do that.

Well, IPv4 uses 16 bits to store the packet length. Basically, 65,536 combinations or 0-65,535. IPv6 has the same limit unless the "Jumbo Packet" option is turned on, in which case the packet can be up to 4GB in size.*

So basically, it was an IPv6 attack with the Jumbo Packet option turned on. Why routers will even process a ping that's a Jumbo Packet, I don't know.

Commendable

While it is certainly embarrassing for both CloudFlare and Juniper I agree with the article that the best way to handle this kind of SNAFU is to open about it. CDNs are, despite the marketing blurb, a very technical product and with preventing DoS attacks one of their key reasons for existing. You're dealing not only with customers but also other networks and possibly, depending on the size of an attack, with the IETF. While exploits like these that depend on discovering esoteric bugs can be developed silently, fixes need to be public and pushed out across networks as quickly as possible.