Vectoring on Lantiq VRX200 / VR9 - missing callback for sending error samples

Yes, these are in there because initially I included all counters available, I need to revisit whether corrupted is worth keeping in. EDIT: Done corrupted is indeed just a sum of two other counters...

But my understanding was, corrupted counts all DTU detected as not "defect", but that also includes empty DTUs for which no retransmission will be attempted. I guess I need to look in the driver whether corrupted are actually synthesized from the other two or not.

Scrap that I just looked at the code I posted....

      /* RxCorrupted */
      pCounters->nRxCorruptedTotal = pCounters->nRxCorrected + pCounters->nRxUncorrectedProtected;

Yeah, that will go then :wink: Thanks for pointing that out

Okay, cleaned up now:

                var retx_counters = {                                                                                                                                                           
                        title: "%H: DSL G.INP(retx) retransmission counters",                                                                                                                   
                        vlabel: "DTUs (per 30 sec.)",                                                                                                                                           
                        y_min: -0.1,                                                                                                                                                            
                        y_max: 0.1,                                                                                                                                                             
                        alt_autoscale: true,                                                                                                                                                    
                        data: {                                                                                                                                                                 
                                instances: {                                                                                                                                                    
                                        errors: ["far_rtx_tx", "near_rtx_c", "near_rtx_ucp", "near_rtx_tx", "far_rtx_c", "far_rtx_ucp"]                                                         
                                },                                                                                                                                                              
                                options: {                                                                                                                                                      
                                        errors_near_rtx_tx: {                                                                                                                                   
                                                title:         "ReTx tx-retransmitted (far, accounted as near)",                                                                                
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "ff00ff",                                                                                                                        
                                                overlay:       false,                                                                                                                           
                                                flip:          true,                                                                                                                            
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                        errors_near_rtx_c: {                                                                                                                                    
                                                title:         "ReTx corrected (near)",                                                                                                         
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "00ff00",                                                                                                                        
                                                overlay:       true,                                                                                                                            
                                                noarea:        true                                                                                                                             
                                        },                                                                                                                                                      
                                        errors_near_rtx_ucp: {                                                                                                                                  
                                                title:         "ReTx uncorrected protected (near)",                                                                                             
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "ff0000",                                                                                                                        
                                                overlay:       true,                                                                                                                            
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                        errors_far_rtx_tx: {                                                                                                                                    
                                                title:         "ReTx tx-retransmitted (near, accounted as far)",                                                                                
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "af00af",                                                                                                                        
                                                overlay:       false,                                                                                                                           
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                        errors_far_rtx_c: {                                                                                                                                     
                                                title:         "ReTx corrected (far)",                                                                                                          
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "00af00",                                                                                                                        
                                                flip:          true,                                                                                                                            
                                                overlay:       true,                                                                                                                            
                                                noarea:        true                                                                                                                             
                                        },                                                                                                                                                      
                                        errors_far_rtx_ucp: {                                                                                                                                   
                                                title:         "ReTx uncorrected protected (far)",                                                                                              
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "af0000",                                                                                                                        
                                                flip:          true,                                                                                                                            
                                                overlay:       true,                                                                                                                            
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                }                                                                                                                                                               
                        }                                                                                                                                                                       
                };                     





Stray observations after one full day of data:

  • SNR remains pretty much rock solid, dips by ~0.1 dB during daytime hours, recovers at night.
  • The number of error reports seems to be completely unrelated to the current quality of the line, while the amount varies slightly between 30-second intervals overall it amounts to basically a flat line of ~400 reports per minute. Are error reports sent all the time, i.e also for "no errors to report, chief"?
  • It seems like rtx-uc are the new CRC: my ES, at pretty much the same level as I previously observed with the Zyxel/Broadcom modem (~ 3 to 4 per day), are now mirrored in rtx-uc instead of CRC which remain firmly at zero.

+1 I see the same, could be either of:
a) lower night temperatures (and hence lower electrical noise), not sure that effect is large enough
b) less active traffic and hence less interference
c) maybe some folks shut their router off completely so again less cross-talk interference

My intuition is that that the VCE requests these in more or less fixed intervals and only the VCE knows what it intended to send so I guess the CPE has no real idea about the magnitude of the deviation between intended and received... so having these at fixed intevals independent of load makes a lot of sense... As an observation though without Jan's error sample patches, so without giving feedback to the VCE it often took multiple days for the signal to noticeably degrade (visible as loss of the vectoring gain in the SNR spectra) indicating that the error sample rate is a bit on the high side.... :wink:

Same here, even though this seems to disagree with ITU G.998's recommendation about what to report as CRCs when G.INP is used.

(Argh, disregard my blabbering, I scale per day (*86400), not per hour, so of course it can go beyond 3600. Seems like I have much less FECs than I thought I did.)

1 Like

Could these result from collectd not actually reporting a hard count but trying to convert these somehow? I switched to scale by *1 so events/update interval to get reasonably interpretable numbers, after figuring out that I mainly stare at the 1 hour plot, if at all.

1 Like

Vectoring error reports are not related to any visible errors. They provide a low-level measure of how much the received downstream signal is distorted (see my previous post for a short description). These error samples are calculated based on sync frames, which are regularly transmitted after every 256 data frames, and don't contain any actual data.

Obviously, reports are not sent for every sync frame (there are ~15.5 per second), but only when requested by the vectoring engine. And as you can see, that typically happens at roughly regular intervals.

2 Likes

Thanks. So it doesn't actually make sense to plot them, either they happen at all (which is a one-time check) or they don't.

Alright, I'm very late but still one more voice to confirm: The patch works, impressively so:

(Ignore the two stripes, those are not reconnects but me fiddling with the collectd script.)

I am sure I have never before seen my line values anywhere near that stable with an OpenWrt-Lantiq modem. Not even little signs of degradation after 8+ days uptime.

What I think would be important now: Cherry-pick/backport the patch to the 22.03 branch. RC1 is about to be tagged any time now, and I feel that this is one of the most important improvements the Lantiq target has had in years. It should absolutely go into 22.03.

6 Likes

I second a backport to 22.03.
My line is up since three weeks and it was never that stable.
Not even with the original AVM firmware (FRITZ!Box 3370).

1 Like

I have, but only in the days when the link was synchronized to ~50/10 without vectoring and G.INP... after vectoring was activated, sync increased to ~100/40 and G.INP was activated in both directions stability took a severe hit... I do think that my older builds did not enable the second CPU-sibling, so probably did not trigger the SMP-locking issue back then, but my memory is hazy.

That said, getting this into the imminent stable release seems very desirable as this patch-set puts the "stable" back into the xrx200 dsl part :wink:

3 Likes

+1!

This master build is working just as well for me as an earlier build from Jan's repo:

Hostname OpenWrt
Model TP-LINK TD-W8980
Architecture xRX200 rev 1.2
Target Platform lantiq/xrx200
Firmware Version OpenWrt SNAPSHOT r19192-e1de25b68a / LuCI Master git-22.083.68981-15bbe69
Kernel Version 5.10.107
Local Time 2022-03-31 11:00:48
Uptime 4d 8h 44m 5s
Load Average 0.01, 0.30, 0.39

DSL Status

Line State: Showtime with TC-Layer sync
Line Mode: G.993.2 (VDSL2, Profile 17a, with down- and upstream vectoring)
Line Uptime: 3d 22h 46m 36s
Annex: B
Data Rate: 43.527 Mb/s / 8.051 Mb/s
Max. Attainable Data Rate (ATTNDR): 44.132 Mb/s / 7.960 Mb/s
Latency: 0.32 ms / 0.00 ms
Line Attenuation (LATN): 26.9 dB / 29.7 dB
Signal Attenuation (SATN): 24.5 dB / 29.7 dB
Noise Margin (SNR): 6.4 dB / 6.5 dB
Aggregate Transmit Power (ACTATP): 14.4 dB / 6.3 dB
Forward Error Correction Seconds (FECS): 0 / 21020
Errored seconds (ES): 1 / 718
Severely Errored Seconds (SES): 0 / 348
Loss of Signal Seconds (LOSS): 0 / 18
Unavailable Seconds (UAS): 348 / 348
Header Error Code Errors (HEC): 0 / 0
Non Pre-emptive CRC errors (CRC_P): 0 / 0
Pre-emptive CRC errors (CRCP_P): 0 / 0
ATU-C System Vendor ID: Broadcom 177.197
Power Management Mode: L0 - Synchronized

I'm using the DM200 VDSL BLOB (v5.7.B.5.0.7). There was a full re-sync a few hours into the connection, but the NBN connection I'm on is weirdly affecting the negotiation of SRA (I believe this is an issue in the Lantiq xDSL BLOB running in NBN's Broadcom environment - despite defaulting to dynamic SRA in both directions it's losing downstream SRA); there were some very minor SRA adjustments a few hours after the re-sync but since then it's been rock stable. Neither the re-sync nor the SRA adjustments appear to be related to Jan's vectoring changes - a Fritz!box 7490 with Fritz!OS 7.29 behaves exactly the same way in this regard on this line (thought the OpenWrt builds get slightly better sync rates).

2 Likes

Why does it say G.993.2 everywhere (like with my fritzbox 3370)? Shouldn't it say G.993.5?
OpenWRT 19.07 still reported G.993.5, but 21.02 reports G.993.2

That is just a cosmetic change which is caused by the switch from the lantiq_dsl.sh script to ubus for reporting DSL metrics.

ah, i see, but what is the correct value. are we using .5 or .2? or is it ambiguous?

G.993.5 ("Vectoring") is an extension of, and building upon G.993.2 ("VDSL2"). A Vectoring-enabled line is "G.993.2 with G.993.5", so to say.

I'm also on the NBN, using fritz blob. Very happy with these results and without this vectoring addition the speed would slowly deteriorate. Using the latest with xdarklight's tree merged.

Hostname
Model BT Home Hub 5A
Architecture xRX200 rev 1.2
Target Platform lantiq/xrx200
Firmware Version OpenWrt SNAPSHOT r19421+22-3aa96efa24 / LuCI Master git-22.089.43958-7110635
Kernel Version 5.15.33
Local Time 2022-04-19 00:40:32

DSL Status

Line State:Showtime with TC-Layer sync
Line Mode:G.993.2 (VDSL2, Profile 17a, with down- and upstream vectoring)
Line Uptime:7d 7h 21m 47s
Annex:B
Data Rate:103.574 Mb/s / 44.199 Mb/s
Max. Attainable Data Rate (ATTNDR):110.756 Mb/s / 47.446 Mb/s
Latency:0.13 ms / 0.00 ms
Line Attenuation (LATN):11.7 dB / 14.8 dB
Signal Attenuation (SATN):11.7 dB / 14.6 dB
Noise Margin (SNR):7.9 dB / 9.5 dB
Aggregate Transmit Power (ACTATP):6.9 dB / 13.0 dB
Forward Error Correction Seconds (FECS):0 / 168360
Errored seconds (ES):0 / 724
Severely Errored Seconds (SES):0 / 103
Loss of Signal Seconds (LOSS):0 / 0
Unavailable Seconds (UAS):212 / 212
Header Error Code Errors (HEC):0 / 0
Non Pre-emptive CRC errors (CRC_P):0 / 0
Pre-emptive CRC errors (CRCP_P):0 / 0
ATU-C System Vendor ID:Broadcom 177.197
Power Management Mode:L0 - Synchronized

dsl_cpe_pipe.sh dsmstatg
nReturn=0 n_processed=738531 n_fw_dropped_size=0 n_mei_dropped_size=5 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

And 28 days later ... my line uptime is 28 days:

No changes whatsoever in line stability, SNR wavers only minimally by +/- 0.1 dB over the day:

Not even a particularly bad tuesday with FECs in the range of 40 per minute managed to destabilize the line:


(FECs is "FEC seconds", so 40 out of 60 seconds in a minute were afflicted, the graph very clearly shows the deviation, and it also shows in the DTU counters)

What is very curious though: I thought CRCs were not reported anymore, but that isn't true, my line just didn't show any. After a month of zero CRC errors, three appeared. The CRC counter may not be hugely relevant anymore, it might even be "broken", but it isn't completely abandoned:

So to reiterate: truly impressive improvements in stability. I gave up on Lantiq modems with OpenWrt before, this brings them back into "very much usable" status. Even if, like me, you would only be using the Lantiq device purely as a DSL modem: If they are this stable, they make a great alternative to much more expensive DSL modems with gigabit ports.

And to re-reiterate: @janh, please bring this into 22.03. RC1 has been tagged yesterday, but I believe it's not too late to cherry-pick the patches for inclusion with 22.03 proper.

Personally, I am fine with the snapshot running the modem for the rest of its earthly existence, but other users will want to use their Lantiq as an real all-in-one router, and they would greatly benefit from the stability patches in a proper release version.

Also I fiddled a lot with my collectd script and the collectd/rrdgraph output. I wonder if that's of any interest to others, I would get my github game in order then.

2 Likes

Well, FECs are not really errors... one way of looking at FEC event is, if FECs are zero and you are not already syncing at the maximum you should be able to gain more throughput by accepting a few more FEC events... (that said obviously once you are at full sync the fewer FECs you see the more stability reserve your link has)

I see similar low number (~60/40 IIRC after > 60 days uptime). but with G.INP the CRC reporting seems somewhat screwed up.

Yes, please!

I am aware that FEC and DTU retransmissions are correctable events and part of regular operation. But at least in my mind they are also indicators about the general well-being of the line, and 40 FEC-afflicted seconds per minute where there's otherwise 1 or 2 indicated something that happened that particular tuesday -- maybe a carrier had problems, maybe there was some construction in my street, maybe a neighbor hooked up their heating blanket to the telephone wiring, I don't know. Point is: it worked through it without any consequences, not even triggering any SRA events.

1 Like

Yes, FECs are successful corrections, which as far as I can tell do not even increase the latency since the calculations seem identical whether the RS code is merely checked or used to correct bits. Retransmissions do have observable mild side-effects though, to avoid introducing re-ordering the modem will hold all DTUs that logically came after a corrupted DTU until that DTU was successfully retransmitted or dropped (after taking to many attempts/too much time at retransmitting) at which point the queued up DTUs (or rather the packets contained therein) get released rather burstily.

Tl;dr: FECs are much milder than retransmissions, but both indicate presence of some undesired interferences.

I often see retansmission counters go up wen FECs do, indicating that the interference overwhelms the FEC capability at least at the configured low interleaving level, similar to what your plots above indicate.