Routing instability causes net traffic chaos

Surfers experienced problems accessing parts of the internet Monday, following the flare-up of router instability problems.

Security watchers pinned the problem on "garbage" being thrown into global BGP (Border Gateway Routing) lookup tables, causing entries to become unnecessarily long. Some backbone routers with older software versions are unable to process this traffic, resulting in sessions getting dropped and problems reaching parts of the internet as a result.

BGP is a core routing protocol which, put simply, maps the best available routes for traffic to flow across the interweb. These routing tables have become snarled up with junk in the loose equivalent of motorists being instructed to go around a roundabout 20 times before proceeding with their journey. As a result traffic has become snarled up and, in some cases, may only be able to access parts of the net via the back-roads.

"The source of the problem appears to be with AS 47868 causing AS paths to become too long," writes Marcus H Sachs, director of the SANS Internet Storm Center. "Not much you can do about it unless you have access to your BGP router, in which case you might want to either block AS 47868 or limit the length of any AS path."

BGP autonomous system (AS) numbers represent unique routing domains, which are typically linked to administrative domains. These work something like place names, indicating the fastest routes on the information super-highway, with the important difference that these routing tables are not fixed but periodically updated.

Danny McPherson, chief security officer at Arbor Networks, explained that routers still running versions of Cisco software older than three years old didn’t allocate enough buffer space for "silly long AS paths, and so they blow chunks when they receive the update".

"However, other vendor implementations and patched versions happily propagate the update, apparently through numerous intermediate ASes, so seemingly random sets of BGP routers in the routing system were taking a dump, or dropping sessions with malformed AS path complaints," he notes.

Similar problems caused global routing instability five years ago, McPherson notes in a detailed technical dissection of the problem here.

The practical upshot of the problem is not altogether clear.

UK traffic stats from LINX look normal, but elsewhere Slashdot reports a "single mis-configured router is apparently able to cause a DOS for a huge chunk of the net". ®