How can we make the lantiq xrx200 devices faster


#21

It is possible to change these settings in uboot for Easybox 904xDSL but i think it is not the problem in commercial products like Speedports or AVM get speeds up to 100Mb/s and VMMC, without that.


#22

Yes the values can be changed, ar71xx target is pretty much known to it. But the problem is no one knows what values need to be changed for xrx200 devices yet. Overclocking is always a fun thing and you can also squeeze out a little bit more out if your router.

Yes those devices have proper driver support to enable those kinds of speeds. If @pc2005, as explained above, can achieve the rates mentioned then I think there may be no need to enable flowoffloading for these devices anymore because they can achieve the full capacities without it. It is just a matter of time to see how things go from here.


#23

OK but how can i see or test it, when i enable more speed in uboot ?
Or better how can i see that is working ?


#24

Right now I dont have the overclocked TpLink MR3420 v2 with me but maybe you can search this forum for it. The overclocked speed shows up in uboot when you turn on the device. It always says that the CPU is xxxMHz and RAM is xxx so if you correctly implement the overclock it should show you an increase in CPU clock. My MR3420 showed 700 MHz while in uboot.

Edit: As for testing you can always use iperf to see how much transfer speeds you get with overclocked CPU and also any stress tests to see that the system doesnt crash so these trials and errors will help you implement a safe overclocked CPU which works well. I, myself, have two xrx200 devices, namely W8980 and HomeHub 5A, but I cant use them for testing uboot because there is always a risk of bricking the device and I dont have a flash programmer with me nor any skills to revive them.


#25

I'm using git snapshot openwrt. I changed the kernel source code (openwrt build system will download a copy) and do a recompile.

No I was talking about upload/download speed from gigabit ethernet. I will gonna try wifi when I have the solid base (for example in the snapshot the usb doesn't work - I was thinking about making devel rootfs there).

It is just a raw speedup, so I think the wifi will work faster. I don't think there is much offloading in the driver, just DMA to an address of the packet.

BTW you can measure the ethrnet speed yourself, compile support for netcat with UDP "server" and run a server on one computer and client on the other pushing data from /dev/zero:

nc -l -p port | pv > /dev/null       #computer
cat /dev/zero | nc host port         #xrx200

If you use UDP (-u) then there won't be handshake packets in the opposite way.


#26

I am trying your speedup tweaks for v18.06.1 snapshots. These snapshots are different from the actual master snapshot images and are still in the stable category. First I'll be looking into IRQ balance and after that the above tweaks. It msy take a couple of days to test these tweaks and compile everything. So hopefully it will be worth the wait.

I understand DMA technique, it means CPU will not be direcly bound for the data transfer and even if it is bound the actual load will be far less than the current situation and thus the speeds should improve.


#27

These changes should be fine, just check if the file isn't so much different.

Yeah it did improved the broken driver speed a little. But the main problem is the driver itself not a borderline optimalization of the DMA transfers.


#28

@arnysch have develop a method for the Easybox 904xDSL to load an other uboot into RAM and start them from here.
see:


I do not know if it work (with changes) from an other Lantiq device too, but maybe possible.

So i have start my selfcreated uboot with 600MHz CPU speed and 300MHz RAM speed with this method
and it show me the speed into bootlog.
But i do not know if it shows me into bootlog if it really works, how can i see it ?


#29

You can compile a dhrystone benchmark. There should be one in busybox. I think even running a long shell loop should be OK for benchmarking (measure script with time command)


#30

I've managed to backport ethernet driver from vanilla kernel v5 (seems to be better than actual openwrt) and I've added switch and phy from the old one. This seems to increase the troughtput of the ethernet (server 10.0.0.1, tplink xrx200 10.0.0.80):

test upload:
        server
                nc -u -l -p 4321 | pv > /dev/null
        tplink
                cat /dev/zero | nc -u 10.0.0.1 4321
        speed = 18,5MiB/s
test download:
        tplink
                nc -u -l -p 4321 > /dev/null
        server
                cat /dev/zero | pv | nc -u 10.0.0.80 4321
        speed = ~100 MiB/s (varies)

If you test the old ethernet driver by same commands you will see an increase to about 200%.

My second patch is for DMA burst size. Both patches were tested in openwrt git snapshot 6e104c63d678518f93425e5e34f6caf75228024c, but they are updated for a3ccac6b1d693527befa73532a6cf5abda7134c0, where they was not yet tested. I don't know the exact way how add patches to openwrt, but it should be fine to put them into target/linux/lantiq/patches-4.14 or directly patch the kernel at build_dir/target-mips_24kc_musl/linux-lantiq_xrx200/linux-4.14.96 .

0904-backport-vanilla-eth-driver.patch
0905-increase-dma-burst-size.patch

Another speedup patches are in this thread.

edit: fixed patches, there was surplus "./" in path, works from target/linux/lantiq/patches-4.14


Xrx200 IRQ balancing between VPEs
#31

#32

I've managed to fix bugs in the ethernet driver when phy didn't start and I've implemented a basic skb frags (DMA scather gather) functionality, try to test these patches [0904-backport-vanilla-eth-driver.patch] [0905-increase-dma-descriptors.patch]. The speedup against original openwrt driver seems to be pretty high. Script for testing here, change IP, run iperf3 -s on lantig, script on host. Don't forget to remove old patches.


#33

I put all of your patches in the openwrt/target/linux/lantiq/patches-4.14/ (also from the other thread) but I don't think if they have been applied at all. I dont see any changes in the throuput speed although I have a 100mbit port on my Laptop not a 1gbit one. I am seeing 91 mbit/s for upload and 44.5 mbit/s for download through iperf. On the other hand I dont see any changes in cat /proc/interrupts either. I think for some reason your patches are not being applied. How can I see if they are being applied?

Edit: I can see the patches being applied but it also says patch unexpectedly ends in the middle of the line for all of your patches and after that it says Hunk # applied successfully. and it continues.

Edit 2: I carefully copied patches and applied 0666 permissions as was the case for other ones and then compiled again and there you go it works. Since my laptop can only handle 100mbit/s I can actually download and upload at 92mbit/s with my TD-W8980. Also irq balance seems to work and it increase the wifi throughput about 10% as you suggested before. With OpenWrt ethernet driver the upload to router was 92mbit/s and download was 44mbit/s and with this driver both directions get 92mbit/s.


#34

Hi there same result for patches 901 / 902 / 903
But this works with the files of post #1 of this thread.
See not working: Xrx200 IRQ balancing between VPEs post 19
See working: Xrx200 IRQ balancing between VPEs post 22

Patch 904 / 905: I can not test it on Easybox 904xDSL because 904 it is incompatible with 4027-NET-MIPS-lantiq-support-fixed-link.patch
I will test it on O2-Box 6431, but the question are: in combination with patches 901 / 902 / 903 or with the raw files from post #1 or without changes before.


#35

I think pastebin deletes the last empty lines in a post. But the patch applies OK

So an upgrade yay, congrat :smiley:
Do all ports work?

These are just normal patches same as 4027-NET-MIPS-lantiq-support-fixed-link.patch the openwrt build will use them automatically. So the only problem is the compatibility with 4072. You should be to fix this merge problem manualy, it is just a few lines in noncritical section (you can see the differences if you make two patched version, 4027 only and 904 only).

DMA burst patch is not required, ICU patches (dts, irq.c and smp-mt) should be usefull, but system should build without them. I did used all of them.


#36

Well yes and no. When you asked this I tried all the ports and then on first try Port 1 gave this:

[ 2505.807876] lantiq,xrx200-net 1e108000.eth eth0: not enough TX ring space for frags

After I tried to do iperf test again it gave this kernel panic on Port 1:

root@OpenWrt:/# iperf -c 192.168.1.196
[ 2573.313664] ------------[ cut here ]------------
[ 2573.316914] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x21c/0x3f8
[ 2573.325178] NETDEV WATCHDOG: eth0 (lantiq,xrx200-net): transmit queue 0 timed out
[ 2573.332629] Modules linked in: ath9k ath9k_common ath9k_hw ath pppoe nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD pppox ppp_async owl_loader nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9 iptable_mangle iptable_filter ip_tables crc_ccitt compat drv_dsl_cpe_api ledtrig_usbport drv_mei_cpe ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm drv_ifxos dwc2 gpio_button_hotplug
[ 2573.402320] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.98 #0
[ 2573.408244] Stack : 00000000 00010000 00000001 8007c528 80650000 80600320 00000000 00000000
[ 2573.416598]         805c6ce4 83811dac 838340ac 8063a687 805c1c10 00000001 83811d50 74ab3c7f
[ 2573.424952]         00000000 00000000 807a0000 00010000 00000000 00000000 00000007 00000000
[ 2573.433308]         00000117 55000000 00000116 00000000 00000000 80650000 00000000 803c9af8
[ 2573.441665]         00000009 00000140 80637f94 00010000 00000003 00000000 00000004 80790004
[ 2573.450019]         ...
[ 2573.452451] Call Trace:
[ 2573.454919] [<800115e4>] show_stack+0x58/0x100
[ 2573.459371] [<804b58f4>] dump_stack+0xe4/0x120
[ 2573.463817] [<80033b00>] __warn+0xe0/0x114
[ 2573.467892] [<80033b64>] warn_slowpath_fmt+0x30/0x3c
[ 2573.472854] [<803c9af8>] dev_watchdog+0x21c/0x3f8
[ 2573.477583] [<80095648>] call_timer_fn.isra.3+0x24/0x84
[ 2573.482780] [<800958f8>] run_timer_softirq+0x250/0x31c
[ 2573.487928] [<804d4d90>] __do_softirq+0x128/0x2ec
[ 2573.492618] [<80038b20>] irq_exit+0xac/0xc8
[ 2573.496797] [<80274d0c>] plat_irq_dispatch+0xfc/0x138
[ 2573.501849] [<8000bb08>] except_vec_vi_end+0xb8/0xc4
[ 2573.506799] [<8000d4d0>] r4k_wait_irqoff+0x1c/0x24
[ 2573.511592] [<80071d8c>] do_idle+0xd4/0x154
[ 2573.515763] [<80072004>] cpu_startup_entry+0x24/0x30
[ 2573.520726] [<80038a58>] irq_enter+0x58/0x74
[ 2573.525025] ---[ end trace 23ebd19a3a343560 ]---
[ 2573.529608] eth0: transmit timed out!
connect failed: Host is unreachable

So it seems there is a bug possibly and it can be solved, right? On my 2nd try everything works. Even on Port 1 and I think I should test it while attached to my HH5A on 1gbit/s link. Maybe it will shed some more light on what is going on.

Edit: Also it seems tx ring is getting full for some reason. Maybe its too low.

root@OpenWrt:/# iperf -c 192.168.1.196 -t 60
------------------------------------------------------------
Client connecting to 192.168.1.196, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.254 port 40052 connected with 192.168.1.196 port 5001
[ 3365.915243] tx ring full after send
[ 3365.944203] tx ring full after send
[ 3367.033160] tx ring full after send
[ 3367.102508] tx ring full after send
[ 3368.671618] tx ring full after send
[ 3368.734038] tx ring full after send
[ 3368.904054] tx ring full after send
[ 3369.025805] tx ring full after send
[ 3369.223867] tx ring full after send
[ 3369.451556] tx ring full after send
[ 3369.518576] tx ring full after send
[ 3370.098506] tx ring full after send
[ 3370.136027] tx ring full after send
[ 3370.187110] tx ring full after send
[ 3370.210587] tx ring full after send
[ 3370.278975] tx ring full after send
[ 3370.301863] tx ring full after send
[ 3370.376010] tx ring full after send
[ 3370.380904] tx ring full after send
[ 3381.090977] lantiq,xrx200-net 1e108000.eth eth0: not enough TX ring space for frags

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-73.8 sec   455 MBytes  51.7 Mbits/sec
root@OpenWrt:/#
root@OpenWrt:/# [ 3425.233724] eth0: transmit timed out!

I will run some more tests tomorrow to see how it behaves but so far no load on router even when going with full 100mbit/s throughput.


#37

Yeah that's the problem with my 1Gbit ports too. I didn't figure yet why, but it is either I don't understand how to limit tx xmit function when the ring is growing or the system is too slow to be able to deallocate already transmitted packets in tx housekeeping function.

The error is just the warning the kernel network stack timeouted, unless there is the setting for panic on every warning, it will just continue.

But thanks for testing.


#38

Do you think something like this is possible for lantiq based devices? I think those are available for mips24kc but only for ar71xx.


#39

I think probably not. If there are general optimisations of the network stack, they will probably end up in vanilla kernel and the hardware offloading thinks in SoC will be most likely implemented in a different way. I don't even know if there is some hardware (for example NAT offloading). There is no public manual for xrx200.


#40

I did some more tests with wifi and it seems the throughput has actually improved. Not to mention that device is not routing any traffic though, it's just connected through LAN from the main router.

Tests before the patches:

DL: 35-40 mbit/s
UL: 40-45 mbit/s
CPU Load: ~1.8

After the patches:

DL: 50-105mbit/s
UL: 50-105mbit/s
Avg: 60-75mbit/s roughly
CPU Load: ~1.0

If testing both ways simultaneously the DL and UL is around 25mbit/s and 35mbit/s with load around 1.8 and more.

But I think there's still room for more. If you look the current /proc/interrupts:

root@OpenWrt:~# cat /proc/interrupts
           CPU0       CPU1
  7:    1252052    1190227      MIPS   7  timer
  8:      18828      39540      MIPS   0  IPI call
  9:     355149     113465      MIPS   1  IPI resched
 22:          1     129928       icu  22  spi_rx
 23:      79176          0       icu  23  spi_tx
 24:          0          0       icu  24  spi_err
 62:          0          0       icu  62  1e101000.usb, dwc2_hsotg:usb1
 63:          0      45560       icu  63  mei_cpe
 72:      18111          0       icu  72  vrx200_rx
 73:          0      16712       icu  73  vrx200_tx
 91:          0          0       icu  91  1e106000.usb, dwc2_hsotg:usb2
112:          0        215       icu 112  asc_tx
113:          0          0       icu 113  asc_rx
114:          0          0       icu 114  asc_err
126:          0          0       icu 126  gptu
127:          0          0       icu 127  gptu
128:          0          0       icu 128  gptu
129:          0          0       icu 129  gptu
130:          0          0       icu 130  gptu
131:          0          0       icu 131  gptu
144:      23871    1752135       icu 144  ath9k
161:          0          0       icu 161  ifx_pcie_rc0
ERR:          1

ath9k is the WiFi chip and it's utilizing more of the 2nd core than the first one. I also managed to see the CPU activity through htop and 2nd CPU was being utilized more. Although I think the average utilization was around 30-40%. If the Wi-Fi can utilize more CPU cycles it can actually provide more than 100mbit/s DL and UL. I am not sure how would it do that but theoretically it should be possible.