November 7, 2017

Subscribe

Comcast’s nationwide outage was caused by a configuration error

by John_A

Yesterday, folks across the country reported that Comcast internet was down — an unusually large outage that lasted around 90 minutes. It turns out that the problem was caused by Level 3, an enterprise ISP that provides the backbone for other internet providers like Verizon, Comcast and RCN. “Our network experienced a service disruption affecting some of our customers,” the firm said in a statement. “The disruption was caused by a configuration error.”

The outage shows yet again just how vulnerable the internet is in the US. Last year around this time, a DDoS attack shut down Spotify, Twitter, the New York Times and other sites, prompting some soul-searching from ISPs and internet security experts. This time it was a case of simple human error, but the results were similar: The internet, which many individuals and businesses now depend on for their livelihoods, went down.

The problem, according to an expert contact by Wired, was a “route leak.” ISPs use something called the Border Gateway Protocol to find networks they can route data packets through. To figure out which routes are the most efficient, so-called Autonomous Systems (ASes) track data packets that are moving through various networks.

A route leak is caused when these ASes relay bad information about their IP addresses. That can cause internet providers to make bad or inefficient routing decisions, causing packets to be delayed or stopped altogether.

A good example of this is an error Level 3 made back in 2015. In that case, a telecom in Malaysia accidentally told Level 3 that it could relay internet data from anywhere around the world. Level 3 accepted the routes, even thought it shouldn’t have, causing worldwide data to be shunted through the Malaysian telecom, which had no way of handling all the traffic.

Something similar could have happened yesterday if Level 3 was, say, tweaking its routing settings and made a mistake. ISPs use filters to guard against such errors, but the scale of the internet makes it difficult to catch them all. After last year’s large DDoS attack, security experts pointed out that internet infrastructure providers like Dyn and Level 3 are particularly vulnerable to attacks. Yesterday’s outage shows how vulnerable they are to human error, too.