Could one of the helpful experts here have a look at my stats on mydslwebstats (same ID as I use here). Since getting a TPLINK TD-W9970 to replace a Home Hub, I've seen it display some strange figures, in particular download SNR of just 1db. Something seems to happen which sends it into a downward spiral, and as you will see yesterday up until about 2pm it was around 5db, then something happened which not only knocked SNR, but also error seconds and FEC which is now over 3 million.

I should stress that the end user experience is unchanged - the line is stable and a speedtest gives me the same figure that I have had for months (just over 41Mb/s). What seems to happen is that eventually the line resets and immediately SNR goes back to 6 and errors sec/FEC drop to normal.

I can't help wondering if this is a bug on the TPLink firmware, because if it was real surely DLM would have reset it overnight? I'm on an ECI cab, BTW.

For correctness, I have made two changes to the subject line of this thread . . . the first being to change "SNR" to "SNRM" and the second to change "download" to "downstream".

Looking at your circuit's statistics, via MDWS, we can see that there was some sort of severe event which resulted in DS CRCs to be accumulated at a rate of around 320 per minute. As a consequence, a significant number of ES have been recorded -- when I checked it was 7629 for the downstream -- and DS FECs were of the order of 3,300,500 per minute.

That abnormal state was cleared by the re-train which occurred at approximately 1100 hours this morning. Since that time the circuit appears to have been operating normally . . . however the damage has been done. The "traffic lights" show red, green and green for your circuit's profile and so we will expect the DLM to take some action within the forthcoming 24 hours.

I wonder if it is an occurrence of a similar event like that which causes problems for Kitz' circuit?

Personally, the sample size is not big enough (it is has not been monitored for enough days) to distinguish anything in particular. Though what I can say is that it seems real enough, or in other words I do not think it is a bug with the firmware (or at least with the “registering” of errors, but errors and low SNRM persists with large amounts of errors then there could be a bug with the chip firmware).

I do not believe there is an exact amount of time before the DLM actions any changes (I could be wrong, but a lot of this information is behind closed doors especially since the patent infringement between Openreach and ASSIA), but generally speaking it should action after the 24 hours of monitoring window.

It looks like someone already beat me to the post here as I was looking into this . I can certainly agree with burakkucat that your errors certainly weren’t looking good. It looks like FECs are still taking a hit still with thousands of them per minute (unless it truly is a bug). At least the CRCs are not through the roof (at the time of checking, 0/min), which will prevent you accumulating massive amount of ES (unless you experience a low SNRM again) that ultimately dictate DLM changes.

It may be worth monitoring the graphs over the next few days to see if you continue to experience a dropping SNRM as there may be a fault with your new hardware if you did not experience this before. If you had this problem before with the previous router, the DLM should have already made substantial changes.

Thanks for taking a look at this. The reason I posted this yesterday was to ask about a different issue than the one that we happened to catch, and that is a slowly decreasing SNRM DS. What seems to happen is that after a retrain I get around 6DB as I should but then over a period of days if not weeks, that value slowly drops until it reaches 1 or 2 and then I get a resync which puts it back to 6 and it starts again. I just wondered if a bad line is something that could genuinely cause this, or whether it was an issue with the way the stats are generated from the router. Given that from an end-user perspective nothing changes, with the line remaining stable and the speed steady, it just seems slightly odd to have this slow erosion.

With regard to yesterday, I've never been aware of having a catastrophic event like that before, but then I haven't been monitoring it closely up until I got my Pi set up, so it could have happened before. However, I'm not aware of any DLM action over the past six months or so. In fact, the last time was when BT "upgraded" us Infinity 1 users to 76Mb and I had a full DLM reset. At the time I had been stuck on a banded profile and that caused it to sync higher at around 47. After 10 days of stable running it dropped to what it is now, about 41Mb/s.

I don't think my line is in a very good state generally. I'm around 300 yards from the cab but an engineer told me over 100 yards is aluminium, which is not good. Way back when FTTC came to my area and I was one of the first in the cab, I was getting mid-50s but crosstalk has had an impact, the last time about a year ago when my next door neighbour changed from Virgin to Sky and I took a 5Mb hit as a result. The annoying thing is my guaranteed download speed is 40Mb/s and I get 41, so BT won't send an engineer out. I don't think my line is ever going to improve even with Gfast so I may have to switch to Virgin, but they have their own set of problems (not least of which an over-subscription issue in my areas that took 2 YEARS to fix...). I'm just hoping my line gets worse - then I may be in with a shot of an engineer visit.

The latest version improves the Anti-interference performance of TD-W9970(EU)1.0.

It seems rather ambiguous, but I imagine the improvement is related to the WiFi and not the DSL modem part of the router. I had a quick skim through the TP-Link emulator but could not see anything in reference to the anti-interference (unless it is under a different name or if it is an internal reference).

I do not know what else to add at this point. I personally would use the original router and observe if this sort of gradual SNRM drop occurs on it over the next few days. With faults, regardless of whether it is the line or customer hardware, good ol' process of elimination works a charm and it shows you've done your homework before escalating it.

I have checked your MDWS today, and the SNRM drop is gradually happening again it seems. As for errors, the router is still reporting a lot of FECs but it looks like that is keeping your CRCs and ES down really low - interleaving/G.INP is enabled on your line, and if it was not for this then you would be experiencing large amounts of errors in the form of CRCs which would degrade your experience.

I have the V2 and am running the 19/09 firmware. I think I'll do what you say and try the old router but I may just give this a week or so more so I can get a good run of stats logged. As you say, it's a process of elimination.

A gradually decaying SNRM, until it reaches a significantly low value such that the modem retrains, is rather indicative of a bit-swapping problem with the modem.

It was relatively recently -- I'm guessing somewhere in the December 2017 to January 2018 time-frame -- that we recently had another discussion on the subject. At the moment I can't lay a paw on the appropriate thread but will mention that Kitz was a contributor. A forum search, using the string "bit-swapping", may be beneficial.

You know what, burakkucat? I was going to suggest it was something with bit-swapping! But I wasn't sure so I kept my lips sealed.

I did check renzz's MDWS, and it looks like the modem is bit-swapping. It does look to be very consistent (much more consistent than other lines, but with more FECs in the process), so perhaps it is something with the Bit Allocation Table?

I did check renzz's MDWS, and it looks like the modem is bit-swapping. It does look to be very consistent (much more consistent than other lines, but with more FECs in the process), so perhaps it is something with the Bit Allocation Table?

Bit-swapping is the process that makes adjustments to the bit allocation table. I suppose you could check that all the relevant capabilities (bit-swapping, monitored tones) are enabled from the output of the xdslctl/xdslcmd or whatever command.

You know what, burakkucat? I was going to suggest it was something with bit-swapping! But I wasn't sure so I kept my lips sealed.

If you have a suspicion, please, always "air" it. We often get to understand events by everyone contributing facts, theories, observations and suggestions . . . after all, you are not a "newcomer" but a regular contributing member.

Quote

I did check renzz's MDWS, and it looks like the modem is bit-swapping. It does look to be very consistent (much more consistent than other lines, but with more FECs in the process), so perhaps it is something with the Bit Allocation Table?

Haven't looked at the stats, but this caught my eye and caused to to chip in.

Quote

A gradually decaying SNRM, until it reaches a significantly low value such that the modem retrains, is rather indicative of a bit-swapping problem with the modem.

Indeed. Whilst not the only cause, it is one of the most likely culprits.

---

I beta tested the TD-W9980 before it was released for sale in the UK. One of the things I immediately highlighted to them was degradation of SNRM due to bit swap. Bearing in mind that at that time my line was capable of well in excess of 80Mbps, this is an excerpt from my initial feedback.

Quote

Unfortunately I’ve had to take the TD-W9980 off my line as the SNR Margin continued to drop without making any recovery. This eventually causes the line to lose sync.

Sync: 79999 (max)SNRMargin: 8.3 dB

Sync: 80000 (max)SNRMargin 6.6 dB

Sync: 80000 (max)SNRMargin 5.8 dB

I would have liked to investigate what's causing the apparent deterioration on on my line, but the lack of access to any vdsl stats via telnet mean that its impossible to see what's going on.

I'm taking a wild guess here but the gradual loss of SNRm could perhaps be bitswap not working correctly?

It also caused DLM to take action on my line for the first time ever during this testing period and apply INP.

I was then sent an unlocked version of their f/w and a special Lantiq tool which ran with it that allowed me to collect full stats from the modem. Using those logs they were indeed able to see that it was a bit-swap issue. New f/w was provisioned for the TD9980 before it went on sale which fixed the problem with bit swapping.

---ETAJust a note on this. It is not always a f/w issue. It is also a possibility that the line is generating sufficient low level errors so that bitswap does eat into the SNRM over time. Bitswap can and does mark individual tones unavailable for use, as thats how it is designed to work. Sometimes the modem can adjust power to give back a wee bit of of true SNR, but over time its still possible to see a decline in the SNRM. Performing a retrain allows a new BAT to be allocated and put all available tones for that line back in use.