Posted
by
CowboyNeal
on Friday October 21, 2005 @06:45AM
from the they're-just-faking dept.

xbmodder writes "Two tier one ISPs are down today. At about 23:30PST both Verio and Level 3 starting having problems with routes. According to Level 3 this is a software upgrade gone awry. Is this the end for Level 3?" Many, many reports about this are coming in, and if you're wondering why the stories were rather sparse overnight, it's because it's difficult to post them without internet access. Hope everyone else is back online too.

I was up late studying for a German exam, and I was having problems connecting to websites hosted in Germany that I was using to help myself review (dict.leo.org and canoo.net, if you're curious). US websites worked no problem.

While this only lasted a few hours, it still caused a mess across the North American Internet during those hours. The point is a small amount of big networks are responsible for over 90% of the traffic on the Internet. If alter.net went down it would be total chaos. If just one of the major peering points went down, sure the traffic would be rerouted, but overloading the other points at such high latency that it would be almost unusuable. You better hope no one destroys MAE-EAST or we'll have a live example of what ife without the Internet is like.

We're not talking about just a server. We're talking about the entire ISP's networking capability. Tier 1 ISPs own huge swaths of networks-- literally miles and miles of cable, and sometimes radio and other links. They route the traffic across these lines.

When a Tier 1 provider goes down, their customers go down too. That picture on the Boing Boing page shows a list of the Tier 1 providers. Every ISP that is NOT a Tier 1, gets their access from a Tier 1.

People speculate that Level 3 is dying because they've been making some really bad decisions lately, resulting in a lot of outages. A couple of weeks ago, they actively filtered out traffic from their competetor, Cogent, over a dispute from how much to charge at the point their networks exchanged traffic (called a 'peering point'). Now this. The rumor is that the company is in financial trouble.

It's not a rumour that Level 3 is in financial trouble - it's clear for all to see. They have crushing debt repayments right now.

The Cogent spat isn't over yet either - Level 3 are going to de-peer Cogent again on November 9th. They are trying to force Cogent to pay for transit, but right now it looks like Cogent holds the strongest hand and Level 3 will be once again forced to back down.

It was maybe 2 hours or so before new routing tables started spreading to bypass Level3's and Verio's networks, and afterwards it started stabilizing again, then it seems Level3 has since then woke up again. The XO network also had routing troubles from this btw, maybe more too. Sites and services such as AOL, SpeakEasy (when asked, they were stumped and could only say it affected all their customers, hehe), Google, and Wikipedia had access problems depending on where you lived during this timeframe.

I dropped my BGP session to Level3 but they did not retract the routes, so not only could they not route my packets but they claimed (via the routing table) that they still could. From my vantage point (Chicago) the problem was resolved in about an hour,

I had the (mis)fortune to be working in a NOC for a web hosting company last night, and it turned out to be a period of 4 hours where some of our monitored systems would have flaky connections, they'd be down for a few minutes, then come back up, but probably go back down again a half hour later. Frustrating, yes, but it didn't take very long to determine that Level 3 was the issue. Trying to get a timeframe out of them as to when it would be fixed was much more frustrating, but was pretty much what I expected from them.

I'm sure you know this, but for the rest: "flapping" is the common term for when a router's routing tables
rapidly cycle between two invalid states [cisco.com]. The dead bird analogy is pretty descriptive, but the term "flapping" has technical and not allegorical origins.

The reason that Level 3 isn't happy with the peering arrangement currently is that it's not even remotely even. Level 3 sends almost nothing over Cogent's network and Cogent sends over a vast majority of their traffic through Level 3. A peering agreement is based on the premise that the companies will be sending almost equal amounts of traffic through each network.
Level 3 has been analyzing that for a time now but the last straw was when Cogent had a sales blitz targeting Level 3 customers saying that they would dramatically drop their prices to almost nothing to get them to switch away from Level 3. They are now also using the downtime that was experienced due to the peering problem in their advantage even though Cogent is in the wrong. Cogent knew about the depeering and did nothing to resolve it.

Oh yes, I'm aware of all of that - but (generally speaking) Cogent has the content, and Level 3 has the users. Guess who catches the most heat from the de-peering from its customers - Level 3 - as their customers will tend to see the problem first.

I predict that Cogent will do the same again as well - not lift a finger to fix the problem when they are de-peered on November 9th, and Level 3 will probably end up being forced to re-peer as customers whine that they are not getting the whole Internet and threaten to take up Cogent's free 1 year offer.

They are doing pretty well, not amazingly so (what telco is?) but they have a lot of cash and a stable recurring revenue base. They also have a pretty good outlook because they are one of the few companies not caught with thier pants down when the FCC mandated E911 support - which a lot of people are coming to Level 3 for. If you think VOIP has a future then so does Level 3. The market thinks so; regardless of your outlook the stock has been up quite a bit recently.

To call them a "dot bomb" is really unfair since they were far more financially prudent during the timeframe, which is why they are still around at all in the dark forest of discarded Telco husks.

Disclaimer, I work for Level 3. But on the other hand doesn't that mean that I know more than most people about the real situation here?

I have had my paycheck bounce at companies I've worked for in the past and been told I'd have to wait an extra month or two for pay at said companies (you know the kind, six employees and the owners mom uses the company AMEX for trips to DisneyWorld while you wait weeks more to get paid). Level 3 is a few billion dollars away from that sad state.

And don't accuse me of drinking Kool-aid either - after going through a lot of layoffs over the years you have a VERY realistic outlook on what the company does well and what it does not.

That's not accurate. Lots of tier 2 and lower providers own their infrastructure. The important qualification of being a tier 1 ISP is that they don't pay anyone else to exchange traffic with them. The tier 1 guys are all predicated on the idea that they are huge enough that none of the others of them can afford to not have good and direct peering with them. Level3 can't afford to not be peered with MCI, and MCI can't afford to not be peered with AT&T, etc. So they all peer for free with each other. The tier 2 providers pay somebody to exchange traffic.

Also, the description of this story is probably also wrong; cogent isn't a tier 1 provider. Most sources seem to think (although the contract negotiations are confidential) that cogent was already paying Level3 for their peering, but that Level3 decided they wanted more money, based on the amount of traffic they were moving and which direction it was going.

But anyhow, your description doesn't work...the ISP that I used to work for that had 2000 customers would have qualified as tier 1 by your definition.

I was watching the looking glass's last night and just about every router and core router within Level3 was seriously Flapping, though I was under the impression that flapping was the equivelant of: Ok peers im Up, no down, oops I mean up, erm down, I mean.......

But If you float through the IRC channels on Linux.org there may still be ##level3 and #level3 with some people in there, you might be able to snag a copy of the log from last night from an op.. Best bet, ##level3 on irc.linux.org... Otherwise you might find some remnants form the night before in #fedora

This is not correct. Tier 1 ISP has nothing to do with leasing or own telco. I have worked for Tier 1 ISP which did not own any of its telco lines. Everything was leased from different companies: MCI, AT&T, GTE. (hint: that ISP had AS 1).

The way Tier 1 ISP is defined is mostly by its magnitude. At the time I've worked for that ISP, the rough rule of tumb was that Tier 1 ISP must have a few large capacity pipes from coast to coast at least. Must carry enough traffic so other Tier 1 ISPs can exchange the traffic (peer-to-peer) with this entity. Not strict rules as you can see, but in reality it works well.