Vectoring on Lantiq VRX200 / VR9 - missing callback for sending error samples

I'll try and remember to do this.

I don't think so - I've just checked my DM200 with OEM v1.0.0.66 and dynamic SRA (RA_MODE=3) is enabled for both upstream and downstream, which should be sufficient for NBN compliance based on Draytek advice to affected Vigor owners - see here.

"for NBN adjustment" immediately preceding the lfcs command lines shown above (not very illuminating I know but I'm pretty sure this relates to fixing NBN compatibility with the original firmware realease v1.0.0.34 as I've seen it in another version - 1.0.0.52 I think).

unused or 2nd hand Telstra Technicolor DJA0231 is the best bet, also easy to bridge; I'd avoid the Arcadyan LH1000 as there have been lots more firmware issues though nearly identical hardware.

NB: sorry to others for OT comment above; I can't see an option to reply privately? :frowning_face:

1 Like

IMHO not a big issue (but then this is not my thread, so not my call to make). However, occasionally apparent tangents, circle back to the main thread and add real value, much harder to track if a sub-thread fully split-off, but clearly a matter of subjective taste and preferences where to draw the line.

If you click on the avatar image or name at the top of a post, and information overlay pops-up with a blue "message" button, that when pressed should open the editor for a private message.

root@OpenWrt:~# uptime ; dsl_cpe_pipe.sh DSM_STATisticsGet
 10:47:03 up 5 days, 23:25,  load average: 0.24, 0.15, 0.06
nReturn=0 n_processed=3736603 n_fw_dropped_size=0 n_mei_dropped_size=0 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

So after 5 days the first resync, but not an unenforced sync caused by the lantiq modem but a resync cycle the DLM/DSM system used to increase to upload sync limit from 32 to 37 Mbps, so if at all a sign that the the patches actually work as intended and increase the link's stability.
Given that at higher sync there is less safety reserve it will be interesting to see for how long the link will stay up with the new sync limits. But so far things are looking really bright.... this might return the lantiq to the "boring" level of just works I enjoyed before my DSLAM was upgraded to mandatory vectoring and bidirectional G.INP.

@takimata it still is a bit early to tell for sure, but this last set of patches seems to make a big difference.

2 Likes

So ~ one week uptime, one ISP initiated resync (with concurrent sync limit increase indicating that the DLM was happy with the links stability and that all error and proto-error counts stayed low enough)

root@OpenWrt:~# uptime ; dsl_cpe_pipe.sh DSM_STATisticsGet
 12:45:20 up 7 days,  1:24,  load average: 0.44, 0.11, 0.03
nReturn=0 n_processed=4430871 n_fw_dropped_size=0 n_mei_dropped_size=0 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

Let's see how long this will last (and whether 37Mbps is the upper limit the DSLAM will allow my to sync at.

UPDATE 2022 02 10:
So one week since last resync:

 09:03:53 up 12 days, 21:42,  load average: 0.47, 0.28, 0.10
nReturn=0 n_processed=8162228 n_fw_dropped_size=0 n_mei_dropped_size=0 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

and

DSL Status
Line State: Showtime with TC-Layer sync
Line Mode: G.993.2 (VDSL2, Profile 17a, with down- and upstream vectoring)
Line Uptime: 6d 23h 6m 19s
Annex: B
Data Rate: 116.797 Mb/s / 36.998 Mb/s
Max. Attainable Data Rate (ATTNDR): 139.739 Mb/s / 40.237 Mb/s
Latency: 0.14 ms / 0.00 ms
Line Attenuation (LATN): 9.7 dB / 8.2 dB
Signal Attenuation (SATN): 9.7 dB / 8.1 dB
Noise Margin (SNR): 12.4 dB / 8.6 dB
Aggregate Transmit Power (ACTATP): -3.6 dB / 14.5 dB
Forward Error Correction Seconds (FECS): 0 / 18989
Errored seconds (ES): 11 / 109
Severely Errored Seconds (SES): 1 / 64
Loss of Signal Seconds (LOSS): 2 / 0
Unavailable Seconds (UAS): 255 / 255
Header Error Code Errors (HEC): 0 / 0
Non Pre-emptive CRC errors (CRC_P): 41 / 0
Pre-emptive CRC errors (CRCP_P): 0 / 0
ATU-C System Vendor ID: Broadcom 194.127
Power Management Mode: L0 - Synchronized




the plots show how the link behaved with the new code; one resync, resulting in higher upload rate (and hence in lower upload SNR margin) and also a higher FECs rate, which is to be expected. Have not seen this link as stable with an OpenWrt lantiq modem ever since vectoring and G.INP were enabled.

@janh, this confirms that your patches really improve the stability and robustness of the lantiq driver on links with vectoring and G.INP. What can/should we do to get these changes into OpenWrt's main branch?

3 Likes

I can also confirm that @janh's patches work for me!

I had issues with my Home Hub 5A since Telekom changed to a new Broadcom 194.127 DSLAM. Before that, my HH had VDSL uptimes of several weeks with the previous Infineon DSLAM, but after they changed to Broadcom I only got the fallback line with 16mbit/s up and 1mbit/s down.
Then I tried all modem firmwares one can find. While only firmwares >5.8.* were able to establish a 50Mbit/s VDSL connection, none of them was stable, most of them disconnected after 15min to 4 hours. Only exception was 5.8.1.5.0.7, which lasted for up to 2 days during my tests.

But then I found this thread and tested @janh's fork. Since then I've been using (latest?) firmware 5.9.1.4.0.7 and everything is stable again. I only noticed two reconnects within several days, but they all happend at 03:00, so could probably have been ISP initiated.
Now the line is up since one week:
image

Thank you for the fixes @janh and I am also all for merging these patches into OpenWRT :+1:

3 Likes

This indicates that you link might have only been 50/10 before on a DSLAM without active vectoring, because the fallback profile essentially just uses VDSL2 up to frequency 2.2MHz resulting in Syncspeeds around 16/1... and only used if a VDSL2 modem does not support vectoring. Before the Nahbereichausbau, my link was the same and was also rock-solid for weeks with the HH5A, once vectoring was activated, I needed to switch to a vectoring-enabled firmware blob and instability started.

Yes, apparently the issue is a "race" condition due to locking that is incompatible with SMP (so if the xrx200 SoC is configured to use both CPU core-lets as in OpenWrt both cores can stumble over each other, something Jan fixed with better locking). Whether/how long it takes/took to trigger the condition differs...

Probably, ~3:00-5:00 is Telekom's maintenance window in which their DLM/DSM system (based on Assia's DSLExpresse) is allowed to cause retrains/resyncs. One of that DLM's hall-marks is restricting the maximal sync-rate an DSLAM/MSAN allows to a nice round number. OpenWrt does not show that limit, but your download sync is so close to 60Mbps that I would bet the DSLAM limits the Sync to 60Mbps (the maximum for the 50/10 profile is ~63/12, so your upload is already at maximum, but the download might still increase, assuming DLM considers your link sufficiently stable).

+1!

2 Likes

OK, thank you for clarification. Your assumptions make perfect sense. This also explains why I only had 40/10 earlier with Telekom and only 30/10 with Vodafone before. I always blamed them for the slow connection (I ordered 50mbit/s), but it was probably my HH's fault if full vectoring wasn't working :sweat_smile:

Yeah, it looks like your DSLAM was simply not using vectoring. But I would not blame that on your poor HH5A, but simply that it took Telekom a while to convert/modernize your DSLAM.

BTW, the fall-back range contains IIRC the lowest 512 sub carriers (roughly the low 2,2 MHz of bandwidth), which with 15bit/carrier being the maximum vor VDSL2 and 4KHz dsl clock results in a maximum gross total of:
512 * 15 = 7680 bit/tick
512 * 15 * 4000 = 30720000 bit/tick * tick/second = 30720000 bit/second = 30.720 Mbps
but that is the aggregate total for both up and downstream assuming all 512 subcarriers are in use (which is not true) and all are loaded with the theoretical maximum of 15 bits (which is unlikely), so neither 30/10 nor 40/10 would be possible within the fall-back profile.

It's too difficult to compile openwrt image from zero for unexperienced. It will be days or weeks playing with docker and/or virtual linux downloading all the dependecies and fixing the system in the same time...

Is there a possibility to have it in some kind of patch or package that you can apply on current openwrt installation?
Any plans for this patches to be in the main openwrt branch?

You can get janh's changes at https://github.com/openwrt/openwrt/compare/master...janh:ltq-vectoring.diff. I have found they apply cleanly on top of the current HEAD (918d4ab as of today).

Then it's only a matter of running ./scripts/feeds update -a, ./scripts/feeds install -a, make menuconfig and make as per https://openwrt.org/docs/guide-developer/toolchain/use-buildsystem.

1 Like

It's only a matter of running ./scripts/feeds update -a , ./scripts/feeds install -a

It's only if you have your build system tested, up & running.
And to make a working build system is not a one-liner, but a very long hard work.

That was my initial assessment as well, but once I started actually trying this it turned out relatively easy, I think I followed gidance by @hnyman on a physical Ubuntu host.

Have a look at https://openwrt.org/docs/guide-developer/toolchain/install-buildsystem
Which offers copy and paste lines for common Linux distributions how ot install the pre-requisites....

1 Like

I have a docker environment for this and occasionally build/run the container on a remote host. Avoids tying up my laptop. It's just a case of installing a few OS packages.

But I digress. The changes in @janh's vectoring branch have significantly improved observed stability of my connection after I built a Netgear DM200 image. It's been up for a day now, whereas I was previously getting disconnects every 6-8 hours.

I'd be keen to see these patches make their way into the main repo.

One thing to note, in the screenshot above is that the max attainable/actual values are correct. Reportedly, I'm getting a higher downstream rate than is attainable. This isn't related to the vectoring changes however.

2 Likes

You're using the DM200 xDSL firmware in your build?

Coincidentally this afternoon I got my DM200 build running (using the DM200 OEM xDSL firmware), and it appears that the vectoring error samples are being processed:

root@OpenWrt:~# dsl_cpe_pipe.sh dsmstatg
nReturn=0 n_processed=2332 n_fw_dropped_size=0 n_mei_dropped_size=0 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

Unfortunately I think my build has firewall related problems as no traffic is flowing WAN<->LAN :-(, though WAN <-> device traffic (e.g. ping) is crossing the link ok. My line is a lot longer than yours too as I only get sync rates of 44Mbps down and 8Mbps up.

It is curious that an ostensibly Annex A xDSL firmware is reporting an Annex B connection... Anyone know how to identify the tone set in use?

The DM200 5.7.B.5.0.7 firmware has the same defaults so why Netgear added these explicit commands is a mystery.

BTW: my suspicion is that the -1 for b20BitSupport doesn't indicate "disabled" but rather "use compiled in default".

Given how little public documentation seems to be available that is fully within the possible :wink:
But maybe 20Bit (what ever that is) is simply not supported on VDSL?

   /**
   20 bit constellation config/status value.
   - DSL_FALSE: 20 bit support is disabled (not indicated in G.Hs)
   - DSL_TRUE: 20 bit support is enabled (indicated in G.Hs)

   \note The configuration and status of this feature is only available for
         ADSL only platforms and for downstream direction. */
   DSL_CFG DSL_FeatureSupport_t b20BitSupport;

I'm using the Annex B firmware listed here (sha1 3c97762499817686c688b71724ab6a329daa81f8).

It does appear vectoring samples are processed:

nReturn=0 n_processed=860278 n_fw_dropped_size=0 n_mei_dropped_size=0 n_mei_dropped_no_pp_cb=0 n_pp_dropped=0

I did have some initial routing issues due to what look like changes in device/interface naming (eth0/lan) in the master branch compared to the current latest stable release (and it's noted during upgrade that the old config is not compatible), but this just required rebuilding my configuration (its being run as a dumb modem - native WAN + management on a tagged VLAN )

Line has been up and stable for about 3 days now. It's quite clear where in the attached plot when the change was made.

2 Likes

In which case the -1 could equally likely mean "not applicable" or "don't care" :-). Whatever, "0" seems to be the spelling of "explicitly disabled" in such cases...

Mmmh, looking deeper in drv_dsl_cpe_api.h reveals the following:

typedef enum
{
   /**
   Feature is not available respectively not applicable */
   DSL_FEATURE_NA = -1,
   /**
   Feature is enabled. */
   DSL_FEATURE_DISABLED = 0,
   /**
   Feature is disabled. */
   DSL_FEATURE_ENABLED = 1,
   /*
   Delimiter only */
   DSL_FEATURE_LAST
} DSL_FeatureSupport_t;

That should solve this side-question for good. What it does not answer is what is 20BitSupport, but since it seems to be ADSL only, I am happy to live with that mystery unsolved :wink:

1 Like