About what I know, think and think I know about computer security, Linux and maybe something else…

IPv6 packet loss at Telenet. Telenet is one of the largest cable providers in Belgium and they have native IPv6 support. Their IPv4 connectivity is very stable and packet loss or outages are very rare. However, IPv6 is less stable.

What’s the problem?

I’m using IPv6 for my ssh connections. A stable connection for ssh is critical, because you notice immediately when packet loss is occurring. Especially if packets are dropped during intervals of more then 10 seconds. It looks like your ssh connection is stalling.

Normal situation at Hetzner

To test where things went wrong, I started by selecting a target for my tests. In this case, I took the IP address of ipv6.google.com

No packet loss, 11msec average and almost no deviation from the average. These are all signs of a good connection.

mtr from Hetzner to Google

mtr does works similar to traceroute. In this output, it tried to traceroute the path 1000 times. Every router in between my server and the destination responded every time. All values are very healthy.

Conclusion

The packet loss starts at router 2a02:1800:2:20c0::2. This router is still within the network 2a02:1800::/24 which is part of Telenet AS. Message to @Telenet: would it be possible to ask one of you engineers to have a look at this router? It could be overloaded or not correctly configured. Thanks.

Update on 20170103

I’ve been in contact with my ISP Telenet. Thanks Telenet!! They’ve investigated the issue and didn’t find any signs of packet loss on their routers. For that reason, they proposed to replace the Motorola CV6181E cable modem. At first, I thought that the modem was not to blame. But then I ran a traceroute from a server on the internet to my firewall behind the modem. This is the result. Note that I launched the traceroute when ping6 was failing.

Hop 15 is my firewall. Hop 14 is the default gw for my firewall. That router was always reachable. The DOCSIS 3.0 cable modem is installed between hop 14 and 15. It’s not visible because it doesn’t operate on layer 3. Obviously I can only test the devices that operate on layer 3 of the OSI model. Possible there are other layer 2 devices, but the modem could certainly be dropping the packets. Next step : replace the Motorola CV6181E DOCSIS3 cable modem.

Update on 20170106

Yesterday, Telenet contacted me to arrange a onsite visit of an engineer to do some tests and try and fix the issue. That engineer came before noon. There are the actions taken.

Visit by Telenet Engineer

Replaced the coax cable amplifier/splitter with a new model.

Connected his laptop directly to my modem to test if he also experienced packet loss while pinging to Google over IPv6.

The engineer confirmed that there is indeed around 10% packet loss over IPv6. No packet loss over IPv4.

It took quiet some time for the device to start. I guess it downloaded and installed the latest firmware and configuration. During this time, the engineer inspected the external equipment in my street.

After the initial start of the new device, the internet connectivity was restored.

At this moment, the engineer re-ran the ping test and for a few minutes, everything looked fine and the engineer took of to the next customer. Apparently the old modem did cause the issues.

@Telenet : Big thank you for the engineer. He was really friendly and he understood the problem well. I’m glad he was able to replicate the issue with his own laptop and I’m also glad the the Telenet devices in my home are replaced. That rules out problems with those devices.

But….

After the engineer left, I was also convinced that the problem was fixed until I started experiencing those hanging SSH sessions again. Immediately, I ran a more extended ping6 and mtr and came back with the following results.

A regular ping6 also confirmed around 10% packet loss. This was identical as before. Swapping the devices didn’t change a thing. The problem is not solved.

Some further debugging that might help

Close look at hop 3

Also, looking at the high roundtrip time of the 2nd and 3rd hop, I tested it manually this manually using nping. I captured the related packets on the outside interface of my router. That’s the physical interface that’s directly connected to the modem.

What do we learn? When sending out packets to google with a hop limit of 2, router 2a02:1800:2:20c0::2 answers with an ICMP time exceeded. Strange that the router answers after approx. 7 full seconds. Normally routers tend to reply within 10s of milliseconds, not seconds. Let’s now send the same packets with a hop limit of 3 and capture those.

What do we learn? On 2 of the 3 packets, I get an almost immediate response from 2a02:1800:0:1:2104:201:0:3 with hoplimit of 62. That’s expected. The strange this is that one packet was answered by router 2a02:1800:2:20c0::2 which is only 2 hops away, not 3. And again, he’s responding after 7 full seconds. If I was a Telenet engineer, I would investigate why the router with IP 2a02:1800:2:20c0::2 displays this strange behaviour. In my opinion, there are 2 problems with this router.

The router shouldn’t wait 7 seconds before answering

The router shouldn’t have answered at all to the packets with hlimit 3.

There might be a good reason for this behavior, but it’s looks strange to me. Also, it might be totally unrelated to the packet loss, but mtr reveals that the packet loss starts on that router.

A reverse lookup of 2001:2000:3080:772::1 gives brx-b2-link.telia.net. Looking at the name, we can see this is a Telia router located in Brussels. It’s directly connected to the Telenet network. Note that while my first hop was always reachable, the Telia router wasn’t reachable for 10% of the time. That means that the source of the packet loss should be located on or between these 2 routers.

Update on 20170112

Today, I received a few calls from Telenet. The people I spoke with were very helpful and knew their stuff. Around 3PM, an engineer called me to inform me that they identified the issue. The same person called me back an hour later to inform me that the problem was fixed. Unfortunately, I couldn’t test immediately.

The strange issue with hop 2 still exists, but it’s not related to the previous IPv6 packet loss. At this moment, also the mtr confirms that the IPv6 packet loss problem is fixed. Loss of 0.0% on the last hop.

In this previous post I explained what to configure in order to update SpamAssassin using a proxy server. While the steps resulted in a successful update of the SpamAssassin rules, it also resulted in the following error in auth.log.

Tags

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.