Tuesday, 5 November 2013

BT Huawei FTTC modem bug breaking VPNs

We have confirmed that the latest code in the BT FTTC modems appears to have a serious bug that is affecting almost anyone running any sort of VPN over FTTC.

Existing modems seem to be upgrading, presumably due to a roll out of new code in BT. An older modem that has not been on-line a while is fine. A re-flashed modem with non-BT firmware is fine. A working modem on the line for a while suddenly stopped working, presumably upgraded.

The bug appears to be that the modem manages to "blacklist" some UDP packets after a PPP restart.

If we send a number of UDP packets, using various UDP ports, then cause PPP to drop and reconnect, we then find that around 254 combinations of UDP IP/ports are now blacklisted. I.e. they no longer get sent on the line. Other packets are fine.

Sending 500 different packets, around 254 of them will not work again after the PPP restart. It is not actually the first or last 254 packets, some in the middle, but it seems to be 254 combinations. They work as much as you like before the PPP restart, and then never work after it.

We can send a batch of packets, wait 5 minutes, PPP restart, and still find that packets are now blacklisted. We have tried a wide range of ports, high and low, different src and dst ports, and so on - they are all affected.

The only way to "fix" it, is to disconnect the Ethernet port on the modem and reconnect. This does not even have to be long enough to drop PPP. Then it is fine until the next PPP restart. And yes, we have been running a load of scripts to systematically test this and reproduce the fault.

The problem is that a lot of VPNs use UDP and use the same set of ports for all of the packets, so if that combination is blacklisted by the modem the VPN stops after a PPP restart. The only way to fix it is manual intervention.

The modem is meant to be an Ethernet bridge. It should not know anything about PPP restarting or UDP packets and ports. It makes no sense that it would do this. We have tested swapping working and broken modems back and forth. We have tested with a variety of different equipment doing PPPoE and IP behind the modem.

BT are working on this, but it is a serious concern that this is being rolled out.

Paul has been doing the testing and I expect we will do more today. Testing UDP over IPv6 was one thing I wanted to try, as well as trying again for TCP as we had reports of TCP being affected. Our initial tests suggested not.

Just had a ppp session restart followed by no internet access on devices in the house.A quick look on the Firebrick (which could ping the internet) showed a whole load of UDP port 53 DNS sessions were going unanswered.I tried some netcats to servers to confirm no UDP/53 traffic could be sent.Fortunately I'd only just read your blog post an hour earlier and bouncing the Ethernet port in and out almost instantly resolved the problem.

This will I guess be another of those 'modems' that's actually a router pretending to be a modem by running in 'bridge mode' and regardless of mode runs packets through layer 3 / 4 processing of some sort.

We assume so, as it is at the modem level. It is going to be difficult to test, and I was going to ask you about this. We do not normally do any PPPoE over the GEA access, so it would mean setting that up for testing on such a circuit.

On TalkTalk using their provided DLINK DSL-3680 (Firmware Version: v1.06t Hardware Version: A1), which seems to suffer the same problem. Using SSH over OpenVPN UDP connection is fine until I attempt an scp/sftp or git pull. On my end I can see packets been retransmitted and can confirm they are not hitting the VPN endpoint. So not sure if it's volume of packets or size of packets yet need to perform further testing.

Either a very strange coincidence or this modem is also suffering from the same issue. Would be good to if someone else can reproduce the issue on a DSL-3680.

(Switching to another ADSL modem with different firmware does work fine)