Vectoring on Lantiq VRX200 / VR9 - missing callback for sending error samples

Without counting packets but just from watching the tcpdump output it does looks like the behaviour you're describing.

That's probably an artifact of my scrape interval and how counter increases are visualized here.

Here's a 6h zoom:

1 Like

Meanwhile, I'm using these patches for +1 week now. The results are rather clear:

The steep reoccurring line deterioration is gone. There're SNR fluctuations, but those are probably normal (the street is being worked on and we have the builders in house too).

I'd say this can be thrown in a PR :slight_smile:

I made a clean build from this branch and the Ethernet issue is gone now. I ran the patch for a couple of days and saw that line values are more stable but I'm getting daily resyncs as I did when i tried one of the 5.9 fritzbox firmwares without the error callback.

I tried to use an older blob 5.8 with the error callback but it does not sync at all. So currently I went back to the 5.8 and master as it gives me uptimes of multiple weeks.

One last thing i could try is extract the belgian blob from fritzbox as there might be a difference due to a firmware whitelisting that is applied by the isp..

Hm, that doesn't sound so good.

I'm using a BT HH5A, and the 5.9 fw range was never stable for me on this device.
So I intentionally didn't change the fw to test these patches. It's still 5.8.1.5.0.7 with sha256sum
a6b841aaa27f75d5709486c1fa52d1254399c478503be7c790271f7356642571 which I've been using for a long time now.

This doesn't look normal to me. If your line didn't start at such a high SNRM (the target SNRM is probably 6 dB), these drops would definitely cause interruptions. It is interesting that this only happens in downstream direction. This suggests that either vectoring still doesn't work properly, or there is some other issue (like strong external interference near your end of the line).

Could you check if n_processed actually increases at a constant rate over a longer time? For that, you would need to work around the Moiré effect in your graph somehow (maybe just plot the total value instead of packets/second for a quick check?).

Otherwise, a SNR spectrum (g997sansg 0 and g997sansg 1, or g997dsnrg 0 1 and g997dsnrg 1 1) from when the SNRM is high and when it is low could be helpful.

But isn't that what you expect if the error samples are not sent back to the DSLAM's vectoring unit? The signal pre-distortion that was ideal around the time the unit synced simply gets less ideal if over time other CPE change their sending behavior (or come and go on-/off-line)?
This only affects the downstream since in upstream direction signal pre-distortion is not possible (as all CPE would need to coordinate and know what the others are sending). At least that is my take on the presented data.

If the same firmware doesn't sync with the only difference being the changes in by branch, that doesn't sound good. Can you check if this is related to setting the MAC address (I added this in commit "ltq-vdsl-app: set MAC address for vectoring error reports"). The error callback only comes into play after sync when the first error report has been requested, so if there is an issue with that, the line should still reach showtime (if only for a short time).

I just checked with the Annex B firmware from the OpenWrt packages (5.7.9.9.0.6) which doesn't support vectoring, and even that syncs without any issue for me.

1 Like

You are right, I misinterpreted the post. Obviously, if the last half of the graph is with the vectoring changes and the first half without, everything is fine and exactly as expected. @dhewg, you can probably ignore my previous response.

Good to know, I have a similar issue, which might be related to the 5.9 firmware blob.

Mmh, is this for a netgear DM200, as described on https://xdarklight.github.io/lantiq-xdsl-firmware-info/?

I'm using a HH5 with vr9-B-dsl-5.9.1.4.0.7.bin. Its been stable for me but i'm only a 40/10 line witih no vectoring (ECI cabinet).

Never thought to check what the new openwrt release uses as firmware. I just threw my firmware on there and told it to use that.

Yes, sorry that wasn't clear. The 1st ~half is without the patchset, the 2nd with. I tried to show the difference but could have used more words :wink: While at it: the upward peaks in the 1st half (so without your patches) have been router reboots due to updates.

I believe so, it's been a while. The approach was something among: Highest < 5.9 with vectoring and not a prerelease.

Just tried 5.8.1.5.0.7 again, but only get a silent -> handshake -> full init -> silent loop...

This seems to be your version:

bash-3.2$ shasum -a 256 -b ./dsl_vr9_firmware_xdsl-05.08.01.05.00.07_05.08.00.09.00.01.bin 
a6b841aaa27f75d5709486c1fa52d1254399c478503be7c790271f7356642571 *./dsl_vr9_firmware_xdsl-05.08.01.05.00.07_05.08.00.09.00.01.bin

but my DSLAM does not seem to be amused....

@moeller0 exactly, I checked again one hour ago with 5.8.1.8.1.6, which worked well before. Indeed same symptoms, silent => handshake => full init => silent. Shasum is 1f50976644d7c6ed5adb0f6417a246b3e2016424c656ac955c554535b8eaf4b2

I have not yet tried to remove the mac address commit. I'll try again at a later time as one more interruption might cause some friction.

Hm, for the record, the mac I see in /tmp/dsl.scr matches the mac set for the dsl0 interface.

I don't think I configured anything special or unusual, but for comparison purposes:

config device
    option name 'dsl0'
    option macaddr 'xx'

config device
    option name 'dsl0.7'
    option type '8021q'
    option ifname 'dsl0'
    option vid '7'
    option macaddr 'xx'

The dsl settings looks like this:

config vdsl 'dsl'
    option annex 'b'
    option tone 'bv'
    option xfer_mode 'ptm'
    option line_mode 'vdsl'

and

config interface 'wan'
    option device 'dsl0.7'
    option proto 'pppoe'
    option peerdns '0'
    option ipv6 '0'

And I'm using the testing 5.10 kernel.

This is the default Annex A firmware that is included in OpenWrt and it doesn't support vectoring (but this shouldn't prevent a sync as long as the DSLAM has a fallback mode without vectoring).

Here both 5.8.1.8.1.6 and 5.8.1.5.0.7 seem to sync correctly when connected to an ALL126AM2 "DSLAM". However, these Annex A firmwares only work when the other end is configured to use carrier set A43, whereas Annex B firmwares only work with carrier set B43. I haven't tested on a real line with vectoring though.

Has anyone tried if Annex B DSL firmwares work on the Home Hub 5A which all of you seem to be using?

My guess is that you guys don't have the macddr config entries, which you only get if you use luci's network/interfaces/devices/configure button.

If that's the case devstatus dsl0|jsonfilter -e @.macaddr should work instead of config_foreach get_macaddr device macaddr in /etc/init.d/dsl_control

No, that wouldn't work. The MAC address has to be set very early (the driver only writes it to the modem directly after the DSL firmware is loaded), and at this point the dsl0 interface doesn't exist yet.

Also, the MAC address should be set in /etc/config/interfaces by default. And even if it isn't, 00:00:00:00:00:00 will be used instead, which is the default anyway (and in my tests it seemed to work without the correct MAC address).

I assume you meant /etc/config/network? I didn't try but in /etc/rc.d network is linked as S20 and dsl_control as S97, that doesn't ensure dsl0 is configured by then?