This is a callback function that would be defined in another kernel module which is currently missing from OpenWrt. The purpose of this callback is to send error samples to the VCE (Vectoring Control Entity).
Without those error samples, vectoring obviously can't work properly, so that would explain reports about instability with vectoring.
It looks like the callback in the driver just sends the data on ptm0 (this would be dsl0 on OpenWrt).
Looking at the vectoring specification (ITU G.993.5), this implements the L2 Ethernet encapsulation of the backchannel. Alternatively, the error samples could also be transmitted using the eoc (embedded operation channel). I don't know where that is handled, but I suspect it is done entirely within the DSL firmware. Anyway, the actual encapsulation to be used is selected by the VCE, so both methods need to be supported.
If anyone here is using a VR9 device with OpenWrt on a vectoring line, the output of dsl_cpe_pipe.sh dsmstatg would be interesting (and also dsl_cpe_pipe.sh dsmsg to verify that vectoring is enabled). If the value n_mei_dropped_no_pp_cb is non-zero, this means that error samples were dropped due to the missing callback.
I will probably test this myself soon, but I will have to use my actual DSL connection for that, and want to limit unnecessary interruptions of the line (I am not using an OpenWrt modem normally).
Same, I'm not using an OpenWrt VR9 internal modem anymore because I was tired of the constant line deterioration. As much as I would like to help out, I really don't want to rock the boat at the moment, getting excellent and reliable line quality and stability from the external Broadcom-based Zyxel VMG1312. However, I have previously seen this slow, but steady deterioration with VR9 modems also on non-vectoring (ADSL!) lines, is this backchannel also used in other profiles?
I didn't actually use an OpenWrt modem myself so far (well, the stock firmware on my current Broadcom modem is OpenWrt-based, but that doesn't count). But I recently got a supported device for another reason and happened to stumble on this patch. As I read before about these issues I decided to look into it a bit more.
I don't think this is used for anything else (this is basically using the normal data channel for management traffic). This code in the driver is definitely specific to vectoring error samples.
Thanks! As expected, this shows that no error samples were successfully sent (n_processed is 0), but some were not sent due to the missing callback.
The value n_fw_dropped_size (number of error vectors that were dropped in the firmware) is very high, but that could be a result of the missing callback (the callback sets the first 4 bytes of the buffer to 0 to signal the firmware that the data was processed).
So, that's a partial success: The callback is called, but it does not successfully send the data. After enabling error output in the driver with echo enable err > /proc/vectoring, dmesg shows the following: target-mips_24kc_musl/linux-lantiq_xrx200/ltq-vectoring/ifxmips_vectoring.c:149:mei_dsm_cb_func: g_ptm_net_dev == NULL
Now the question is why the driver does not detect the network device.
Edit: I think I found the issue. Linux 3.11 changed how the netdev can be accessed in the event handler. The updated version is on GitHub, but still untested, as I need to interrupt the connection for that.
Of course, this doesn't tell if the error samples are actually received at the other end. The only realistic way to test this is probably to do a long-term test and monitor stability.
However, my line is not well suited for that, as it is very short and stability issues are likely to be hidden by that (current downstream SNR margin is 16 dB). Also, the line normally uses profile 35b and with the VR9 modem it can only run in fallback mode with significantly reduced data rate.
If anyone wants to try this out, the current code is available from the GitHub repository linked above.
This is very unlikely to decrease stability. But if your line is already entirely stable, it won't improve anything, either. So it is understandable if you don't want to try it.
It would make the most sense to try this on a vectored line where a VR9-based modem running OpenWrt is currently unstable (where unstable could also mean a resync just every few days), while stock firmware or other modems are working fine.
I guess my link would qualify, VDSL2 100/40 link with known stability issues ever since the switch from plain VDSL2 50/10 VDSL2-Vectoring+G.INP 100/40. I am just not sure when I can inflict this on my user base/family (my holiday is just over) ...
As a note to anyone who wants to try this: There seems to be a bug in the DSA switch driver, so that packets larger than 1496 bytes are dropped when VLAN is used (FS#3990). This can be worked around by decreasing the MTU by 4 bytes (or patching the driver).
I also noticed that after a few hours (between 8 to 24 so far), a small fraction (<5%) of packets transmitted over the DSL interface get delayed by up to 4 seconds. I am currently trying to find out what causes this, and if it could be related to my changes.
The latency issue is actually related to the vectoring driver. The problem seems to be that the vectoring driver directly calls the ndo_start_xmit method of the PTM driver to send the data. I think the reason this works on non-OpenWrt firmware is that instead of the PTM driver they use the PPA driver where calculation of the tx descriptor is protected by a lock.
Currently I have been using that driver for more than 24 hours and it seems to work. The only thing I noticed is that the n_processed parameter increases faster while the upstream is saturated. I think this is because some error reports get dropped due to missing prioritization, and as a result the VCE requests more to be sent.
As I have done tests for some time now both with and without the vectoring driver, I think that it actually fixes downstream vectoring. During 3 days without the vectoring driver, the downstream SNR margin dropped by 1.3 dB (from 15.9 dB). While this is not that much, it has always stayed within 16-16.4 dB with the vectoring driver so far (except for the first 5 minutes after synchronization where it is often a bit lower). Also, most of the decrease happened one night between 2/3 AM, which is when ASSIA/DLM usually reconfigures lines. And connection/disconnection of other lines is exactly when error reports are needed the most.
I really hope this is working out. The proof is in a longer uptime, though. I distinctly remember that, when I had my vectoring line on a FritzBox 3370 (VR9), my line wasn't completely craptastic -- it sometimes took more than two or three weeks to significantly deteriorate. The real difference is that it never recovered from a downgrade such as the one you describe, and ended up slowly but steadily creeping down to ~6 dB SNR.
Truth be told, I will probably never switch back to an XRX200 system again, if only because I got quite fond of AC wireless and the increased SoC capability of my current MT7621 device (the VR9 can only "almost" handle a 100 mbit line). But I would actually consider switching my external modem with a bridged VR9 device because my current modem only has 100 mbit ethernet, and my downstream currently provides ~115 mbit.
I think it is useful to monitor the change over time of the following parameters: DSM statistics (dsmstatg), downstream SNR margin (g997lsg 1 1) and downstream net data rate if the line uses SRA (g997csg 0 1).
I would strongly recommend using the ltq-vectoring-avm branch. The current version in that branch also sets a priority for the error reports.
I think using AVM's version of the vectoring driver makes more sense than trying to get the original version from Lantiq working without concurrency issues. (I am wondering if the current PTM driver is even correct without the vectoring driver, as the PPA driver uses locking in lots of places while the PTM doesn't use any locks at all.)
(cough) I might have written something that could help you there but it's been a while since I wrote this. I'm not entirely sure if it will still work with 21.02 or current snapshot. I have been loosely following development on Lantiq VDSL stuff, and some things might have changed (I believe VDSL stats are now available via ubus or something, also I'm not entirely sure rrdtool definitions are still using LUA.)
Both branches are now updated to build with the testing kernel.
I only had such severe problems with the version that was shortly in the branch this night. But it's also possible that the concurrency issue just leads to different kinds of issues.
For monitoring line stability this should be enough.
Personally, I monitor the output of all useful dsl_cpe_pipe.sh commands every half hour. Currently, I am also monitoring dsmstatg every minute to check if there are any irregularities.
By the way, an interesting effect of AVM's driver is that you can capture the error reports using tcpdump (SNAP OUI 0x0019a7 and protocol ID 0x0003, this filter works: llc and ether[14:4]=0xaaaa0300 and ether[18:4]=0x19a70003). One thing that still needs fixing is the source MAC address, which is currently all-zeroes. It seems to work regardless, but the spec says to use the proper device MAC address.