Vectoring on Lantiq VRX200 / VR9 - missing callback for sending error samples

I see your point, I'm just having a hard time figuring out how to have them both in the graph and not have it completely muddled. I'm thinking about plotting rtx-tx from the other side as an overlay line. But as rtx-uc, the most important count after all, are so rare compared to rtx-c they get almost drowned out in the graph anyway, and an additional line isn't helping there. That's all cosmetics though, a result of one side's graph having to fit into 80 pixels or less, it will never be ideal. For the time being I think I'm okay with sweeping the potential difference between rtx-c and rtx-tx under the rug. In the end we are still talking about "non-errors".

1 Like

Simple, we know that normally TX > C > UC, so simply stack them such that TX is in the background behind C and UC...
UC are the most severe and likely the smallest so keep the in the front...
C will be the most common, and
TX really only matter if they are >> C and then they are visible even if plotted all the way to the back

That works, or as area all the way in the back ground...

Well, if they are that rare, you can safely ignore them (they will be visible in the counters) and if they become noticeable in the plots you now you have a severe condition at hand.... plotting them in RED all the way in the front will make them visible if the sustained TX or C rates are not too high (I rarely get more than 10 TX in 30 seconds, and most often 0)....

Jein, if TX >> C you will have noticeable latency effects, including increased burstyness (to avoid re-ordering of packets, the receiver will refrain to release DTUs into the OS as long as these are "behind" a DTU that is still in the retransmission process). So if TX close to C I agree the difference does not matter much, but if TX >> C it can become diagnostic....

Here is an example screen shot:

I think this works, yes:

Plotting "corrupted" is rather meaningless, it is just the sum of corrected and uncorrected (which you label "corrupted protected", should be "uncorrected protected")

1 Like

Yes, these are in there because initially I included all counters available, I need to revisit whether corrupted is worth keeping in. EDIT: Done corrupted is indeed just a sum of two other counters...

But my understanding was, corrupted counts all DTU detected as not "defect", but that also includes empty DTUs for which no retransmission will be attempted. I guess I need to look in the driver whether corrupted are actually synthesized from the other two or not.

Scrap that I just looked at the code I posted....

      /* RxCorrupted */
      pCounters->nRxCorruptedTotal = pCounters->nRxCorrected + pCounters->nRxUncorrectedProtected;

Yeah, that will go then :wink: Thanks for pointing that out

Okay, cleaned up now:

                var retx_counters = {                                                                                                                                                           
                        title: "%H: DSL G.INP(retx) retransmission counters",                                                                                                                   
                        vlabel: "DTUs (per 30 sec.)",                                                                                                                                           
                        y_min: -0.1,                                                                                                                                                            
                        y_max: 0.1,                                                                                                                                                             
                        alt_autoscale: true,                                                                                                                                                    
                        data: {                                                                                                                                                                 
                                instances: {                                                                                                                                                    
                                        errors: ["far_rtx_tx", "near_rtx_c", "near_rtx_ucp", "near_rtx_tx", "far_rtx_c", "far_rtx_ucp"]                                                         
                                },                                                                                                                                                              
                                options: {                                                                                                                                                      
                                        errors_near_rtx_tx: {                                                                                                                                   
                                                title:         "ReTx tx-retransmitted (far, accounted as near)",                                                                                
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "ff00ff",                                                                                                                        
                                                overlay:       false,                                                                                                                           
                                                flip:          true,                                                                                                                            
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                        errors_near_rtx_c: {                                                                                                                                    
                                                title:         "ReTx corrected (near)",                                                                                                         
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "00ff00",                                                                                                                        
                                                overlay:       true,                                                                                                                            
                                                noarea:        true                                                                                                                             
                                        },                                                                                                                                                      
                                        errors_near_rtx_ucp: {                                                                                                                                  
                                                title:         "ReTx uncorrected protected (near)",                                                                                             
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "ff0000",                                                                                                                        
                                                overlay:       true,                                                                                                                            
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                        errors_far_rtx_tx: {                                                                                                                                    
                                                title:         "ReTx tx-retransmitted (near, accounted as far)",                                                                                
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "af00af",                                                                                                                        
                                                overlay:       false,                                                                                                                           
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                        errors_far_rtx_c: {                                                                                                                                     
                                                title:         "ReTx corrected (far)",                                                                                                          
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "00af00",                                                                                                                        
                                                flip:          true,                                                                                                                            
                                                overlay:       true,                                                                                                                            
                                                noarea:        true                                                                                                                             
                                        },                                                                                                                                                      
                                        errors_far_rtx_ucp: {                                                                                                                                   
                                                title:         "ReTx uncorrected protected (far)",                                                                                              
                                                transform_rpn: "30,*",                                                                                                                          
                                                color:         "af0000",                                                                                                                        
                                                flip:          true,                                                                                                                            
                                                overlay:       true,                                                                                                                            
                                                noarea:        false                                                                                                                            
                                        },                                                                                                                                                      
                                }                                                                                                                                                               
                        }                                                                                                                                                                       
                };                     





Stray observations after one full day of data:

  • SNR remains pretty much rock solid, dips by ~0.1 dB during daytime hours, recovers at night.
  • The number of error reports seems to be completely unrelated to the current quality of the line, while the amount varies slightly between 30-second intervals overall it amounts to basically a flat line of ~400 reports per minute. Are error reports sent all the time, i.e also for "no errors to report, chief"?
  • It seems like rtx-uc are the new CRC: my ES, at pretty much the same level as I previously observed with the Zyxel/Broadcom modem (~ 3 to 4 per day), are now mirrored in rtx-uc instead of CRC which remain firmly at zero.

+1 I see the same, could be either of:
a) lower night temperatures (and hence lower electrical noise), not sure that effect is large enough
b) less active traffic and hence less interference
c) maybe some folks shut their router off completely so again less cross-talk interference

My intuition is that that the VCE requests these in more or less fixed intervals and only the VCE knows what it intended to send so I guess the CPE has no real idea about the magnitude of the deviation between intended and received... so having these at fixed intevals independent of load makes a lot of sense... As an observation though without Jan's error sample patches, so without giving feedback to the VCE it often took multiple days for the signal to noticeably degrade (visible as loss of the vectoring gain in the SNR spectra) indicating that the error sample rate is a bit on the high side.... :wink:

Same here, even though this seems to disagree with ITU G.998's recommendation about what to report as CRCs when G.INP is used.

(Argh, disregard my blabbering, I scale per day (*86400), not per hour, so of course it can go beyond 3600. Seems like I have much less FECs than I thought I did.)

1 Like

Could these result from collectd not actually reporting a hard count but trying to convert these somehow? I switched to scale by *1 so events/update interval to get reasonably interpretable numbers, after figuring out that I mainly stare at the 1 hour plot, if at all.

1 Like

Vectoring error reports are not related to any visible errors. They provide a low-level measure of how much the received downstream signal is distorted (see my previous post for a short description). These error samples are calculated based on sync frames, which are regularly transmitted after every 256 data frames, and don't contain any actual data.

Obviously, reports are not sent for every sync frame (there are ~15.5 per second), but only when requested by the vectoring engine. And as you can see, that typically happens at roughly regular intervals.

2 Likes

Thanks. So it doesn't actually make sense to plot them, either they happen at all (which is a one-time check) or they don't.

Alright, I'm very late but still one more voice to confirm: The patch works, impressively so:

(Ignore the two stripes, those are not reconnects but me fiddling with the collectd script.)

I am sure I have never before seen my line values anywhere near that stable with an OpenWrt-Lantiq modem. Not even little signs of degradation after 8+ days uptime.

What I think would be important now: Cherry-pick/backport the patch to the 22.03 branch. RC1 is about to be tagged any time now, and I feel that this is one of the most important improvements the Lantiq target has had in years. It should absolutely go into 22.03.

6 Likes

I second a backport to 22.03.
My line is up since three weeks and it was never that stable.
Not even with the original AVM firmware (FRITZ!Box 3370).

1 Like

I have, but only in the days when the link was synchronized to ~50/10 without vectoring and G.INP... after vectoring was activated, sync increased to ~100/40 and G.INP was activated in both directions stability took a severe hit... I do think that my older builds did not enable the second CPU-sibling, so probably did not trigger the SMP-locking issue back then, but my memory is hazy.

That said, getting this into the imminent stable release seems very desirable as this patch-set puts the "stable" back into the xrx200 dsl part :wink:

3 Likes

+1!

This master build is working just as well for me as an earlier build from Jan's repo:

Hostname OpenWrt
Model TP-LINK TD-W8980
Architecture xRX200 rev 1.2
Target Platform lantiq/xrx200
Firmware Version OpenWrt SNAPSHOT r19192-e1de25b68a / LuCI Master git-22.083.68981-15bbe69
Kernel Version 5.10.107
Local Time 2022-03-31 11:00:48
Uptime 4d 8h 44m 5s
Load Average 0.01, 0.30, 0.39

DSL Status

Line State: Showtime with TC-Layer sync
Line Mode: G.993.2 (VDSL2, Profile 17a, with down- and upstream vectoring)
Line Uptime: 3d 22h 46m 36s
Annex: B
Data Rate: 43.527 Mb/s / 8.051 Mb/s
Max. Attainable Data Rate (ATTNDR): 44.132 Mb/s / 7.960 Mb/s
Latency: 0.32 ms / 0.00 ms
Line Attenuation (LATN): 26.9 dB / 29.7 dB
Signal Attenuation (SATN): 24.5 dB / 29.7 dB
Noise Margin (SNR): 6.4 dB / 6.5 dB
Aggregate Transmit Power (ACTATP): 14.4 dB / 6.3 dB
Forward Error Correction Seconds (FECS): 0 / 21020
Errored seconds (ES): 1 / 718
Severely Errored Seconds (SES): 0 / 348
Loss of Signal Seconds (LOSS): 0 / 18
Unavailable Seconds (UAS): 348 / 348
Header Error Code Errors (HEC): 0 / 0
Non Pre-emptive CRC errors (CRC_P): 0 / 0
Pre-emptive CRC errors (CRCP_P): 0 / 0
ATU-C System Vendor ID: Broadcom 177.197
Power Management Mode: L0 - Synchronized

I'm using the DM200 VDSL BLOB (v5.7.B.5.0.7). There was a full re-sync a few hours into the connection, but the NBN connection I'm on is weirdly affecting the negotiation of SRA (I believe this is an issue in the Lantiq xDSL BLOB running in NBN's Broadcom environment - despite defaulting to dynamic SRA in both directions it's losing downstream SRA); there were some very minor SRA adjustments a few hours after the re-sync but since then it's been rock stable. Neither the re-sync nor the SRA adjustments appear to be related to Jan's vectoring changes - a Fritz!box 7490 with Fritz!OS 7.29 behaves exactly the same way in this regard on this line (thought the OpenWrt builds get slightly better sync rates).

2 Likes

Why does it say G.993.2 everywhere (like with my fritzbox 3370)? Shouldn't it say G.993.5?
OpenWRT 19.07 still reported G.993.5, but 21.02 reports G.993.2

That is just a cosmetic change which is caused by the switch from the lantiq_dsl.sh script to ubus for reporting DSL metrics.

ah, i see, but what is the correct value. are we using .5 or .2? or is it ambiguous?

G.993.5 ("Vectoring") is an extension of, and building upon G.993.2 ("VDSL2"). A Vectoring-enabled line is "G.993.2 with G.993.5", so to say.

I'm also on the NBN, using fritz blob. Very happy with these results and without this vectoring addition the speed would slowly deteriorate. Using the latest with xdarklight's tree merged.

Hostname
Model BT Home Hub 5A
Architecture xRX200 rev 1.2
Target Platform lantiq/xrx200
Firmware Version OpenWrt SNAPSHOT r19421+22-3aa96efa24 / LuCI Master git-22.089.43958-7110635
Kernel Version 5.15.33
Local Time 2022-04-19 00:40:32

DSL Status

Line State:Showtime with TC-Layer sync
Line Mode:G.993.2 (VDSL2, Profile 17a, with down- and upstream vectoring)
Line Uptime:7d 7h 21m 47s
Annex:B
Data Rate:103.574 Mb/s / 44.199 Mb/s
Max. Attainable Data Rate (ATTNDR):110.756 Mb/s / 47.446 Mb/s
Latency:0.13 ms / 0.00 ms
Line Attenuation (LATN):11.7 dB / 14.8 dB
Signal Attenuation (SATN):11.7 dB / 14.6 dB
Noise Margin (SNR):7.9 dB / 9.5 dB
Aggregate Transmit Power (ACTATP):6.9 dB / 13.0 dB
Forward Error Correction Seconds (FECS):0 / 168360
Errored seconds (ES):0 / 724
Severely Errored Seconds (SES):0 / 103
Loss of Signal Seconds (LOSS):0 / 0
Unavailable Seconds (UAS):212 / 212
Header Error Code Errors (HEC):0 / 0
Non Pre-emptive CRC errors (CRC_P):0 / 0
Pre-emptive CRC errors (CRCP_P):0 / 0
ATU-C System Vendor ID:Broadcom 177.197
Power Management Mode:L0 - Synchronized

dsl_cpe_pipe.sh dsmstatg
nReturn=0 n_processed=738531 n_fw_dropped_size=0 n_mei_dropped_size=5 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

And 28 days later ... my line uptime is 28 days:

No changes whatsoever in line stability, SNR wavers only minimally by +/- 0.1 dB over the day:

Not even a particularly bad tuesday with FECs in the range of 40 per minute managed to destabilize the line:


(FECs is "FEC seconds", so 40 out of 60 seconds in a minute were afflicted, the graph very clearly shows the deviation, and it also shows in the DTU counters)

What is very curious though: I thought CRCs were not reported anymore, but that isn't true, my line just didn't show any. After a month of zero CRC errors, three appeared. The CRC counter may not be hugely relevant anymore, it might even be "broken", but it isn't completely abandoned:

So to reiterate: truly impressive improvements in stability. I gave up on Lantiq modems with OpenWrt before, this brings them back into "very much usable" status. Even if, like me, you would only be using the Lantiq device purely as a DSL modem: If they are this stable, they make a great alternative to much more expensive DSL modems with gigabit ports.

And to re-reiterate: @janh, please bring this into 22.03. RC1 has been tagged yesterday, but I believe it's not too late to cherry-pick the patches for inclusion with 22.03 proper.

Personally, I am fine with the snapshot running the modem for the rest of its earthly existence, but other users will want to use their Lantiq as an real all-in-one router, and they would greatly benefit from the stability patches in a proper release version.

Also I fiddled a lot with my collectd script and the collectd/rrdgraph output. I wonder if that's of any interest to others, I would get my github game in order then.

2 Likes