By Cabel

A few months ago, a complaint started popping up from users downloading or updating our apps: “Geez, your downloads are really slow!”

If you work in support, you probably have a reflexive reaction to a complaint like this. It’s vague. There’s a million possible factors. It’ll probably resolve itself by tomorrow. You hope. Boy do you hope.

Except… we also started noticing it ourselves when we were working from home. When we’d come in to the office, transfers were lightning fast. But at home, it was really, seriously getting hard to get any work done remotely at all.

So, maybe there was something screwy here?

The Video

Before digging in, here’s this story in convenient summarized video form, if you’d prefer!

Now on to the details.

The Test

The Panic “network topology” is actually very simple. The Panic web servers have a single connection to the internet via Cogent. We colocate our own servers, rather than using AWS or any other PaaS, and we also don’t currently use a CDN or any other cloud distribution platform.

So, if something is making our downloads slow, it ought to be pretty easy to do some analysis and figure out why, or at least where.

We wanted to know three things:

How fast can people download from our website?

How fast can people download from a “control” website that’s not on our network?

What are people using for their internet provider?

We made an extremely simple test page that transfers 20MB of data from our server to the browser, then sends the user to run the same script on the control server, which we chose to host with Linode. (The Linode server is located in Fremont, CA, the closest we could find to us here in Portland.)

We tweeted the link out, and data started pouring in…

The Results

Here’s what we got back, comparing how fast our users could download from our control server through Linode, and from our own servers through Cogent:

(There are 1,645 samples in our target range, after filtering out TLDs with fewer than 10 occurrences, and we’ve done a box plot, which shows a spread of all the data points.)

Well, well, well. It doesn’t take statistical genius to see one glaring outlier — and that was Comcast, with download speeds often being as low as 300 kilobytes/second. And you’ll never guess what provider is used by virtually every Panic employee when they work from home? Yeah, Comcast. There is, in fact, no other cable ISP available to Portland residents.

But, before jumping to conclusions, there was something else that was weird with the Comcast data: a huge number of outliers, way more outliers than any other provider. See all those red dots on the graph, ranging from very slow to very fast?

The answer to that mystery was solved when we plotted out Comcast data across different times of the day…

Nuts. The problem reports we’d been hearing were indeed a real thing.

Our downloads really were slow — but seemingly only to Comcast users, and only during peak internet usage times. Something was up.

At first we thought, maybe Comcast bandwidth is just naturally more congested in the evening as people come home from work and begin streaming Netflix, etc. But that didn’t explain why the connections to our Linode control server from Comcast, during the exact same time windows for each tester, were downloading with good speeds.

We wondered, is Comcast intentionally “throttling” Cogent customers? And if so, why?

The Why

Peering.

Major internet pipes, like Cogent, have peering agreements with network providers, like Comcast. These companies need each other — Cogent can’t exist if their network doesn’t go all the way to the end user, and Comcast can’t exist if they can’t send their customer’s data all over the world. One core tenet of peering is that it is “settlement-free” — neither party pays the other party to exchange their traffic. Instead, each party generates revenue from their customers. Cogent generates revenue from us. Comcast generates revenue from us at home. Everyone wins, right?

After a quick Google session, I learned that Cogent and Comcast have quite a storied history. This history started when Cogent started delivering a great deal of video content to Comcast customers… content from Netflix. and suddenly, the “peering pipe” that connects Cogent and Comcast filled up and slowed dramatically down.

Normally when these peering pipes “fill up”, more capacity is added between the two companies. But, if you believe Cogent’s side of the story, Comcast simply decided not to play ball — and refused to add any additional bandwidth unless Cogent paid them. In other words, Comcast didn’t like being paid nothing to deliver Netflix traffic, which competes with its own TV and streaming offerings. This Ars Technica article covers it well. (How did Netflix solve this problem in 2014? Netflix entered into a business agreement to pay Comcast directly. And suddenly, more peering bandwidth opened up between Comcast and Cogent, like magic.)

We felt certain history was repeating itself: the peering connection between Comcast and Cogent was once again saturated. Cogent said their hands were tied. What now?

The Fix

There was only one last hope: get Comcast to fix it. I know, like we were somehow going to convince this 200 billion dollar corporation to add more capacity to their interconnection with Cogent. If I asked you to rate the possibility of that actually happening on a scale of “no” to “never”, you’d probably pick “come on man are you serious”, right?

But after a lifetime of being a “hey, it’s worth a shot” guy, I had to try. I did a real quick Google search for Comcast corporate contacts and found a person who seemed like they were involved in network operations PR, and I fired off a quick e-mail explaining the situation to Comcast.

And then, the craziest thing happened…

They wrote back quickly. Not only that, but they were on it. We set up a phone call. They took us seriously, they wanted to know the backstory, they wanted to know what our customers were seeing, and they were going to talk to the right people — they even e-mailed Cogent to connect with the right person in peering over there.

And pretty soon a call came back with a definitive-sounding statement: “Give us 1 to 2 weeks, and if you re-run your test I think you’ll be happy with the results.”

Sure enough, we waited two weeks, had our users re-run the speed test, and wouldn’t you know it…

…the problem was essentially gone. Comcast really did fix it. We were now able to measure our Comcast download speeds in megabytes/second instead of kilobytes.

According to Comcast, two primary changes were made:

Comcast added more capacity for Cogent traffic. (Exactly as we suspected, the pipe was full.)

Cogent made some unspecified changes to their traffic engineering.

Here’s where I have to give Comcast credit where credit is due: they really did care about this problem, and they really did work quickly to make it go away.

(One weird thing, though: I was so prepared for a total Comcast dead-end, so sure that Comcast would never even reply, let alone help, that this incredibly positive outcome made me feel suspicious: why me? Why was I able to get this corrected with an e-mail when Cogent couldn’t?

It felt like there was no way this should have worked. If I had to guess, I’d say it’s simple: in the middle of a serious ongoing debate over net neutrality, the last thing Comcast wanted to look like was a network-throttling bad guy in this blog post. But then again, maybe I’m still being too cynical — maybe they just saw a problem they hadn’t noticed and fixed it. (But really, did they really not notice that pipe was full until I asked? Surely there are network monitoring tools?) Frankly, I have to stop thinking about it, because I’ll never know. But no matter the reason, I’m very grateful: thanks for listening to us, Comcast.)

What Does This All Mean

I’d summarize it as follows:

The internet is fragile — and that’s pretty scary.

And while this story amazingly had a happy ending, I’m not looking forward to the next time we’re stuck in the middle of a peering dispute between two companies. It feels absolutely inevitable, all the more so now that net neutrality is gone. Here’s hoping the next time it happens, the responsible party is as responsive as Comcast was this time.

Check Our Work

All of our data, our data analysis scripts, and more, is available at this GitHub repository. You can even click the button in the readme and it will take you to a running JupyterHub notebook where you can play with the data yourself, live in your browser. If you find any insights, or mistakes, please let us know.

Call me Jim

A slight clarification – while the internet possibly is fragile, what you’re actually referring to is the fragility of the services provided by American internet providers.
Don’t make the mistake of conflating the American internet with The internet.

Eric

This is surprisingly weird and spooky (the Comcast-fixing-it-bit!) and I’m glad it worked out for you. Hilariously as I tried to watch your lovely video on my “100 mbit” CenturyLink connection, Youtube buffered and dropped to 240p. ??

Niklas

The problem was not Cogent or Comcast, you are lucky that they acted so fast, in Germany with DTAG Cogent this would be impossible. The actual problem is that you are using the cheapest carrier available, single homed, which is a single point of failure right there.

John

Jerry Kindall

Comcast’s network engineers are extremely competent, in my experience. You can’t manage a network as geographically large as theirs, with as many customers as they have, otherwise. The trouble from the consumer’s end is getting the attention of the right people to solve the problem, which you nailed.

Tomas B

Wow, great story and nice detective work. Pretty cool outcome as well. It wasn’t that I expected! I guess behind all bad technology situations aren’t necessarily bad actors, just people trying their best with good intentions.

Bob Frankston

Cody Johnson

One quick correction I would like to submit. Net neutrality is still law. While Ajit has gotten his winning votes to get it removed, there is still the Congressional Review Act (one more vote needed!) which can overturn that monumentally bad decision.

Sting Mccoy

Interesting article. It would be fascinating if we had someone brave enough in the media to draw the obvious inference. Net neutrality laws are absolutely useless since after all that work it wasn’t even clear if the law had been broken.

I guess we’ll have to go back to the Photoshopped images of ISPs closing down web sites.

Patrick Adams

Ex Cogent IP Engineer here, we historically had a great deal of difficulty getting Comcast to augment the peers. However our position was always to ask downstream customers to work with Comcast support to document their troubles.

Free networking advice however, you should be getting your company’s IP service from multiple tier 1 or a single blended service vendor. As long as Dave S is ruling the roost at Cogent, you will have periodic peering saturation; as it is a part of his business model to weaponize your packetloss.

Paul

paul hinds

In spain its fairly standard (mal)practice for telecoms companies to sell you one deal and provide another. If you phone up and ask they do “unspecified engineering magic” and all of a sudden you get what you paid for. Most people don’t phone, profit margins depend on this. If you have stats you are not initially fobbed of with a reboot your router line.
I’m not surprised to find out that this applies to servers as well as clients.

carlo

Carmi

You miss one detail that is really important about and is inherent in the use of the word “Peering” to describe the relationship between ISPs, and that is for things to be settlement free, the traffic between the two peers needs to be roughly equal. That is why it makes economic sense for both parties not to charge each other for packets, as over a reasonable period (a year or so), it would always be a wash.

That model breaks down when one side is only sending traffic and the other is only receiving it. In the Comcast vs. Cogent case, that was exactly what was happening. Comcast customers were sending a tiny percentage of the data to Cogent that Cogent customers (mostly Netflix) customers were sending to Comcast. That meant they were not really peers (roughly equal), but that Netflix/Cogent were not really paying their fair share.

Unfortunately, since the issue was with Comcast, a cable company and in some aspects, a competitor of Netflix, it is impossible to know which side of the business was really driving this issue, the ISP side complaing about a peering relationship with a non-peer, or the cable side, complaining about a competitor.

J Osborne

Not really. The traffic itself doesn’t need to be roughly equal, the two sides of the peering agreement need to have a roughly equal desire for the service to be good. This can be equal traffic. It could also be an ISP deciding that even though they have a potential peer that almost exclusively sends data that the ISP’s customers want that data badly enough that it isn’t a good idea to try to make the provider of the data pay you to carry it. If you are lucky enough to have multiple viable ISPs covering your area, you could creditably decide which one to get by “which gets Netflix and HBO Go the best?”. If enough people do that, then the ISP that lets Netflix traffic saturate the interconnect and deliver it slowly to their customers will gain fewer new customers (and potentially see a higher attrition rate). The ISPs that decide to peer with Netflix (or whomever is carrying Netflix’s traffic) despite unequal traffic will have more customers.

We don’t see that much in the USA though because we have reliably few areas with more then one viable ISP. We normally see one cable provider, and a far far slower Telco provider, and extremely costly cellular, and sometimes a spotty local fixed point wireless provider. (sometimes the cable provider is missing, and you have multiple local fixed wireless providers).

Ah well, maybe some day fibre to the neighborhood plus some sort of last mile wireless will work out. Or something else will do an end run around the incumbents.

Walt French

I wonder whether your experience was a side effect of a resolution of Cogent’s and Comcast’s historical standoff. Perhaps some money changed hands; perhaps Cogent finally got edge servers at good locations for Comcast…perhaps a thousand solutions that were impossible under the Netflix standoff between Comcast and Cogent, that got resolved?

Obviously, just spitballing here. But when there seems to be more than meets the eye, I often think there’s more than meets the eye.

rem

Comcast isn’t the only ISP with slow Panic downloads… I have AT&T gigabit fiber and Coda updates usually take several minutes to download. It’s been that way for over a year, and I always just assumed Panic’s server wasn’t that fast, but based on this post, probably something else is at play.

Denis Bell

Patrick Ford

Really interesting to see the results of the speed test – forgot I’d even done it until I read this today.

I have to say, my last two experiences with Comcast support were [shockingly] exemplary. I’m as cynical about them as anyone, but if we’re going to be critical of companies when they disappoint us we should also acknowledge when they impress us. As much as I like to imagine their management as cartoon villains, no CEO wants to lead the most hated company in the country. Maybe they’re actually improving?

Eman

“But really, did they really not notice that pipe was full until I asked? Surely there are network monitoring tools?”

Imagine a situation, where I got feedback from customers saying that email management (sending/receiving) was very slow on our servers (I was working in a relatively small hosting company). I talked with mail servers admin about the problem, as I received enough notifications about this. Co-worker showed me network data log from last months, and average network load fluctuated around 47-51%, and he didn’t see any problem. None of the network admins see the problem, as we all know. I called our network provider company what’d be the costs of doubling the speed of our current network pipe, and we did get it within 24 hours. Problem with slow access to mail servers disappeared at once, and I had a long talk with chief about wtf was going on, especially about network monitoring tools showing no problems.

I’d say that both conclusions are correct. I am 100% network operations PR guy didn’t even know about such problem, but he was there, to slap some lazy admins arses and drive them to work.
amen :)

Liam

Engineers Care, same with any department. They would if they could.
Certain departments at places like Comcast for example are good and have talented people, the issue is the higher up, the people that make the decisions to apply the throttling, the costing, to put in the cheap cable not the right cable and so on.
The same goes for software and video games too. You can not put all the blame on the team that made the last Mass Effect for example. The things came out have been along the lines of not enough staff, not enough budget, constant crazy deadlines and change in requirements.

Robert

OG

Same here, nothing has changed – the automatic updates are slow as hell and as ever, since years. Direct download goes fast, updates inside of Transmit are around 100K/sec instead of >10MB/sec possible. I am located in Germany.

Niklas Bölter

I regularly had 56K modem speeds when downloading updates for Transmit on my Mac, and i am (thankfully) very far away from the US. Is this test script still available? The provider in question is Deutsche Telekom AG, which is very well known for holding other people hostage for peering agreements and had a long history of not peering properly at DE-CIX…

Also if Comcast is holding Cogent hostage for peering agreements, wouldn’t it make total sense that Comcast just loves it when Cogents paying customers complain to them? I would also be happy to help, as long as cogent pays for it!