How can we make the lantiq xrx200 devices faster

You can compile a dhrystone benchmark. There should be one in busybox. I think even running a long shell loop should be OK for benchmarking (measure script with time command)

1 Like

I've managed to backport ethernet driver from vanilla kernel v5 (seems to be better than actual openwrt) and I've added switch and phy from the old one. This seems to increase the troughtput of the ethernet (server, tplink xrx200

test upload:
                nc -u -l -p 4321 | pv > /dev/null
                cat /dev/zero | nc -u 4321
        speed = 18,5MiB/s
test download:
                nc -u -l -p 4321 > /dev/null
                cat /dev/zero | pv | nc -u 4321
        speed = ~100 MiB/s (varies)

If you test the old ethernet driver by same commands you will see an increase to about 200%.

My second patch is for DMA burst size. Both patches were tested in openwrt git snapshot 6e104c63d678518f93425e5e34f6caf75228024c, but they are updated for a3ccac6b1d693527befa73532a6cf5abda7134c0, where they was not yet tested. I don't know the exact way how add patches to openwrt, but it should be fine to put them into target/linux/lantiq/patches-4.14 or directly patch the kernel at build_dir/target-mips_24kc_musl/linux-lantiq_xrx200/linux-4.14.96 .


Another speedup patches are in this thread.

edit: fixed patches, there was surplus "./" in path, works from target/linux/lantiq/patches-4.14

1 Like

I've managed to fix bugs in the ethernet driver when phy didn't start and I've implemented a basic skb frags (DMA scather gather) functionality, try to test these patches [0904-backport-vanilla-eth-driver.patch] [0905-increase-dma-descriptors.patch]. The speedup against original openwrt driver seems to be pretty high. Script for testing here, change IP, run iperf3 -s on lantig, script on host. Don't forget to remove old patches.

1 Like

I put all of your patches in the openwrt/target/linux/lantiq/patches-4.14/ (also from the other thread) but I don't think if they have been applied at all. I dont see any changes in the throuput speed although I have a 100mbit port on my Laptop not a 1gbit one. I am seeing 91 mbit/s for upload and 44.5 mbit/s for download through iperf. On the other hand I dont see any changes in cat /proc/interrupts either. I think for some reason your patches are not being applied. How can I see if they are being applied?

Edit: I can see the patches being applied but it also says patch unexpectedly ends in the middle of the line for all of your patches and after that it says Hunk # applied successfully. and it continues.

Edit 2: I carefully copied patches and applied 0666 permissions as was the case for other ones and then compiled again and there you go it works. Since my laptop can only handle 100mbit/s I can actually download and upload at 92mbit/s with my TD-W8980. Also irq balance seems to work and it increase the wifi throughput about 10% as you suggested before. With OpenWrt ethernet driver the upload to router was 92mbit/s and download was 44mbit/s and with this driver both directions get 92mbit/s.

Hi there same result for patches 901 / 902 / 903
But this works with the files of post #1 of this thread.
See not working: Xrx200 IRQ balancing between VPEs post 19
See working: Xrx200 IRQ balancing between VPEs post 22

Patch 904 / 905: I can not test it on Easybox 904xDSL because 904 it is incompatible with 4027-NET-MIPS-lantiq-support-fixed-link.patch
I will test it on O2-Box 6431, but the question are: in combination with patches 901 / 902 / 903 or with the raw files from post #1 or without changes before.

I think pastebin deletes the last empty lines in a post. But the patch applies OK

So an upgrade yay, congrat :smiley:
Do all ports work?

These are just normal patches same as 4027-NET-MIPS-lantiq-support-fixed-link.patch the openwrt build will use them automatically. So the only problem is the compatibility with 4072. You should be to fix this merge problem manualy, it is just a few lines in noncritical section (you can see the differences if you make two patched version, 4027 only and 904 only).

DMA burst patch is not required, ICU patches (dts, irq.c and smp-mt) should be usefull, but system should build without them. I did used all of them.

Well yes and no. When you asked this I tried all the ports and then on first try Port 1 gave this:

[ 2505.807876] lantiq,xrx200-net 1e108000.eth eth0: not enough TX ring space for frags

After I tried to do iperf test again it gave this kernel panic on Port 1:

root@OpenWrt:/# iperf -c
[ 2573.313664] ------------[ cut here ]------------
[ 2573.316914] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x21c/0x3f8
[ 2573.325178] NETDEV WATCHDOG: eth0 (lantiq,xrx200-net): transmit queue 0 timed out
[ 2573.332629] Modules linked in: ath9k ath9k_common ath9k_hw ath pppoe nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD pppox ppp_async owl_loader nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9 iptable_mangle iptable_filter ip_tables crc_ccitt compat drv_dsl_cpe_api ledtrig_usbport drv_mei_cpe ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm drv_ifxos dwc2 gpio_button_hotplug
[ 2573.402320] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.98 #0
[ 2573.408244] Stack : 00000000 00010000 00000001 8007c528 80650000 80600320 00000000 00000000
[ 2573.416598]         805c6ce4 83811dac 838340ac 8063a687 805c1c10 00000001 83811d50 74ab3c7f
[ 2573.424952]         00000000 00000000 807a0000 00010000 00000000 00000000 00000007 00000000
[ 2573.433308]         00000117 55000000 00000116 00000000 00000000 80650000 00000000 803c9af8
[ 2573.441665]         00000009 00000140 80637f94 00010000 00000003 00000000 00000004 80790004
[ 2573.450019]         ...
[ 2573.452451] Call Trace:
[ 2573.454919] [<800115e4>] show_stack+0x58/0x100
[ 2573.459371] [<804b58f4>] dump_stack+0xe4/0x120
[ 2573.463817] [<80033b00>] __warn+0xe0/0x114
[ 2573.467892] [<80033b64>] warn_slowpath_fmt+0x30/0x3c
[ 2573.472854] [<803c9af8>] dev_watchdog+0x21c/0x3f8
[ 2573.477583] [<80095648>] call_timer_fn.isra.3+0x24/0x84
[ 2573.482780] [<800958f8>] run_timer_softirq+0x250/0x31c
[ 2573.487928] [<804d4d90>] __do_softirq+0x128/0x2ec
[ 2573.492618] [<80038b20>] irq_exit+0xac/0xc8
[ 2573.496797] [<80274d0c>] plat_irq_dispatch+0xfc/0x138
[ 2573.501849] [<8000bb08>] except_vec_vi_end+0xb8/0xc4
[ 2573.506799] [<8000d4d0>] r4k_wait_irqoff+0x1c/0x24
[ 2573.511592] [<80071d8c>] do_idle+0xd4/0x154
[ 2573.515763] [<80072004>] cpu_startup_entry+0x24/0x30
[ 2573.520726] [<80038a58>] irq_enter+0x58/0x74
[ 2573.525025] ---[ end trace 23ebd19a3a343560 ]---
[ 2573.529608] eth0: transmit timed out!
connect failed: Host is unreachable

So it seems there is a bug possibly and it can be solved, right? On my 2nd try everything works. Even on Port 1 and I think I should test it while attached to my HH5A on 1gbit/s link. Maybe it will shed some more light on what is going on.

Edit: Also it seems tx ring is getting full for some reason. Maybe its too low.

root@OpenWrt:/# iperf -c -t 60
Client connecting to, TCP port 5001
TCP window size: 43.8 KByte (default)
[  3] local port 40052 connected with port 5001
[ 3365.915243] tx ring full after send
[ 3365.944203] tx ring full after send
[ 3367.033160] tx ring full after send
[ 3367.102508] tx ring full after send
[ 3368.671618] tx ring full after send
[ 3368.734038] tx ring full after send
[ 3368.904054] tx ring full after send
[ 3369.025805] tx ring full after send
[ 3369.223867] tx ring full after send
[ 3369.451556] tx ring full after send
[ 3369.518576] tx ring full after send
[ 3370.098506] tx ring full after send
[ 3370.136027] tx ring full after send
[ 3370.187110] tx ring full after send
[ 3370.210587] tx ring full after send
[ 3370.278975] tx ring full after send
[ 3370.301863] tx ring full after send
[ 3370.376010] tx ring full after send
[ 3370.380904] tx ring full after send
[ 3381.090977] lantiq,xrx200-net 1e108000.eth eth0: not enough TX ring space for frags

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-73.8 sec   455 MBytes  51.7 Mbits/sec
root@OpenWrt:/# [ 3425.233724] eth0: transmit timed out!

I will run some more tests tomorrow to see how it behaves but so far no load on router even when going with full 100mbit/s throughput.

Yeah that's the problem with my 1Gbit ports too. I didn't figure yet why, but it is either I don't understand how to limit tx xmit function when the ring is growing or the system is too slow to be able to deallocate already transmitted packets in tx housekeeping function.

The error is just the warning the kernel network stack timeouted, unless there is the setting for panic on every warning, it will just continue.

But thanks for testing.

Do you think something like this is possible for lantiq based devices? I think those are available for mips24kc but only for ar71xx.

I think probably not. If there are general optimisations of the network stack, they will probably end up in vanilla kernel and the hardware offloading thinks in SoC will be most likely implemented in a different way. I don't even know if there is some hardware (for example NAT offloading). There is no public manual for xrx200.

I did some more tests with wifi and it seems the throughput has actually improved. Not to mention that device is not routing any traffic though, it's just connected through LAN from the main router.

Tests before the patches:

DL: 35-40 mbit/s
UL: 40-45 mbit/s
CPU Load: ~1.8

After the patches:

DL: 50-105mbit/s
UL: 50-105mbit/s
Avg: 60-75mbit/s roughly
CPU Load: ~1.0

If testing both ways simultaneously the DL and UL is around 25mbit/s and 35mbit/s with load around 1.8 and more.

But I think there's still room for more. If you look the current /proc/interrupts:

root@OpenWrt:~# cat /proc/interrupts
           CPU0       CPU1
  7:    1252052    1190227      MIPS   7  timer
  8:      18828      39540      MIPS   0  IPI call
  9:     355149     113465      MIPS   1  IPI resched
 22:          1     129928       icu  22  spi_rx
 23:      79176          0       icu  23  spi_tx
 24:          0          0       icu  24  spi_err
 62:          0          0       icu  62  1e101000.usb, dwc2_hsotg:usb1
 63:          0      45560       icu  63  mei_cpe
 72:      18111          0       icu  72  vrx200_rx
 73:          0      16712       icu  73  vrx200_tx
 91:          0          0       icu  91  1e106000.usb, dwc2_hsotg:usb2
112:          0        215       icu 112  asc_tx
113:          0          0       icu 113  asc_rx
114:          0          0       icu 114  asc_err
126:          0          0       icu 126  gptu
127:          0          0       icu 127  gptu
128:          0          0       icu 128  gptu
129:          0          0       icu 129  gptu
130:          0          0       icu 130  gptu
131:          0          0       icu 131  gptu
144:      23871    1752135       icu 144  ath9k
161:          0          0       icu 161  ifx_pcie_rc0
ERR:          1

ath9k is the WiFi chip and it's utilizing more of the 2nd core than the first one. I also managed to see the CPU activity through htop and 2nd CPU was being utilized more. Although I think the average utilization was around 30-40%. If the Wi-Fi can utilize more CPU cycles it can actually provide more than 100mbit/s DL and UL. I am not sure how would it do that but theoretically it should be possible.

Are you using irqbalance ?

Yes that would be right. I put all of your patches in the 4.14-patches folder.

Its maybe a bit offtopic, but someone know how stable it is now on vdsl2 and what performance am i expecting if im building my own modem only (or bridge mode) image? Last time i tryed a year ago with my 3370 it was kind of disappointing. I was not able to keep the vdsl2 line synced more than a few minutes to hours with the proprietary vr9 driver. And the performance on the vdsl2 line wasnt that great either.

Any suggestions are welcome :slight_smile:

Well this is pretty much off topic but it actually depends on your line. If your line is in bad condition it may not work as you expect it to so if you experience any problems I'd suggest to use a separate modem in bridge mode.

I tested the working patch created from files from post #1 + 904 and 905 on O2-Box 6431.
This box have only 4 100Mbit/s ports + 1 100Mbit/s port behind the gray DSL-port.

The result like the same:
I have to devices on it and do a speedtest:
They starts fast and them the connecting breaks down, after a while the connection come up.
It works if you do not use too much data.

root@OpenWrt:~# cat /proc/interrupts
           CPU0       CPU1       
  0:       6126      29766      MIPS   0  IPI_resched
  1:       2135       1633      MIPS   1  IPI_call
  7:     915250     923515      MIPS   7  timer
  8:          0          0      MIPS   0  IPI call
  9:          0          0      MIPS   1  IPI resched
 62:          0          0       icu  62  1e101000.usb, dwc2_hsotg:usb1
 63:     183466          0       icu  63  mei_cpe
 72:          0      23343       icu  72  vrx200_rx
 73:          0      35973       icu  73  vrx200_tx
 96:      60424          0       icu  96  ptm_mailbox_isr
112:          0        263       icu 112  asc_tx
113:          0          0       icu 113  asc_rx
114:          0          0       icu 114  asc_err
126:          0          0       icu 126  gptu
127:          0          0       icu 127  gptu
128:          0          0       icu 128  gptu
129:          0          0       icu 129  gptu
130:          0          0       icu 130  gptu
131:          0          0       icu 131  gptu
ERR:          0

(Hmm, no vrx200_tx_2 inside /proc/interrupts)


@ahmar16 with regard to fastpath: It called "PPE" (Protokoll Processor Engine)
read post #4
In my hope it is possible to create an kernelmodule from an stockfw source and use them.
But I didn't worry about it anymore.

So, I use a by jomehub 5a on a Telekom vdsl2 50/10, Line resold by Telefonica/O2. I routinely see sync-uptime in multiple months (basically no unenforced resyncs). This is as bridged-modem only, as the box was overtaxed with adding NAT, traffic shaping and wifi processing on top of the vdsl-modem duty....
One caveat is that my line is still not using vectoring, so I have no information about stability in that increasingly likely scenario....

 	                Upstream    Downstream
Current Rate (Kbps)	2239        24125
Max Rate (Kbps)  	39997	    58892
SNR Margin (dB) 	26.6	    20
Line Attenuation (dB)	17.3	    15.1
Errors (Pkts)	        0	    215

(I hope the formating is OK) ... the log from TD-W9980B(DE), stock firmware. Stability is fine, connection (22/2 Mbps) works all the day. I used openwrt firmware for about a day and it seems to be stable (I was testing different firmwares and some very very unstable, but the one directly from openwrt was OK).

Hmm irqbalance should (in theory) use the most optimal distribution. BTW even if the distribution is 1:1 it doesn't mean the load will be 1:1 too. RX interrupt have much different load than TX interrupt (at least in ethernet).

Yeah "ring is full" and "tx timeout", I'm working on it right now (most lags are already fixed). It seems SoC/kernel is too slow to actually deallocate transmitted DMA descriptors, so new packets must wait on it and they will timeout. It would be interesting to try increase system tick speed (openwrt is using 250 Hz, which is IMO pretty low for 1Gbit ethernet). BTW the ring full infos are just informative, you can decrease the event if you put RX and TX interrupt on differet core.

[ 2780.742945] lantiq,xrx200-net 1e108000.eth eth0: port 5 got link
[ 2782.790849] lantiq,xrx200-net 1e108000.eth eth0: port 5 lost link
This is weird, I didn't change the parts of the driver which control link. That must be something else.

It wasn't used on my modem at all. In a discussion on the mailing list, there are plans to use it in the future (it is reserved for direct DSL pipe right now, but it could be potentially used for 2 TX queues for 2 VPEs).