You can compile a dhrystone benchmark. There should be one in busybox. I think even running a long shell loop should be OK for benchmarking (measure script with time command)
I've managed to backport ethernet driver from vanilla kernel v5 (seems to be better than actual openwrt) and I've added switch and phy from the old one. This seems to increase the troughtput of the ethernet (server 10.0.0.1, tplink xrx200 10.0.0.80):
test upload:
server
nc -u -l -p 4321 | pv > /dev/null
tplink
cat /dev/zero | nc -u 10.0.0.1 4321
speed = 18,5MiB/s
test download:
tplink
nc -u -l -p 4321 > /dev/null
server
cat /dev/zero | pv | nc -u 10.0.0.80 4321
speed = ~100 MiB/s (varies)
If you test the old ethernet driver by same commands you will see an increase to about 200%.
My second patch is for DMA burst size. Both patches were tested in openwrt git snapshot 6e104c63d678518f93425e5e34f6caf75228024c, but they are updated for a3ccac6b1d693527befa73532a6cf5abda7134c0, where they was not yet tested. I don't know the exact way how add patches to openwrt, but it should be fine to put them into target/linux/lantiq/patches-4.14 or directly patch the kernel at build_dir/target-mips_24kc_musl/linux-lantiq_xrx200/linux-4.14.96 .
0904-backport-vanilla-eth-driver.patch
0905-increase-dma-burst-size.patch
Another speedup patches are in this thread.
edit: fixed patches, there was surplus "./" in path, works from target/linux/lantiq/patches-4.14
I've managed to fix bugs in the ethernet driver when phy didn't start and I've implemented a basic skb frags (DMA scather gather) functionality, try to test these patches [0904-backport-vanilla-eth-driver.patch] [0905-increase-dma-descriptors.patch]. The speedup against original openwrt driver seems to be pretty high. Script for testing here, change IP, run iperf3 -s
on lantig, script on host. Don't forget to remove old patches.
I put all of your patches in the openwrt/target/linux/lantiq/patches-4.14/
(also from the other thread) but I don't think if they have been applied at all. I dont see any changes in the throuput speed although I have a 100mbit port on my Laptop not a 1gbit one. I am seeing 91 mbit/s for upload and 44.5 mbit/s for download through iperf
. On the other hand I dont see any changes in cat /proc/interrupts
either. I think for some reason your patches are not being applied. How can I see if they are being applied?
Edit: I can see the patches being applied but it also says patch unexpectedly ends in the middle of the line
for all of your patches and after that it says Hunk # applied successfully.
and it continues.
Edit 2: I carefully copied patches and applied 0666 permissions as was the case for other ones and then compiled again and there you go it works. Since my laptop can only handle 100mbit/s I can actually download and upload at 92mbit/s with my TD-W8980. Also irq balance seems to work and it increase the wifi throughput about 10% as you suggested before. With OpenWrt ethernet driver the upload to router was 92mbit/s and download was 44mbit/s and with this driver both directions get 92mbit/s.
Hi there same result for patches 901 / 902 / 903
But this works with the files of post #1 of this thread.
See not working: Xrx200 IRQ balancing between VPEs post 19
See working: Xrx200 IRQ balancing between VPEs post 22
Patch 904 / 905: I can not test it on Easybox 904xDSL because 904 it is incompatible with 4027-NET-MIPS-lantiq-support-fixed-link.patch
I will test it on O2-Box 6431, but the question are: in combination with patches 901 / 902 / 903 or with the raw files from post #1 or without changes before.
I think pastebin deletes the last empty lines in a post. But the patch applies OK
So an upgrade yay, congrat
Do all ports work?
These are just normal patches same as 4027-NET-MIPS-lantiq-support-fixed-link.patch the openwrt build will use them automatically. So the only problem is the compatibility with 4072. You should be to fix this merge problem manualy, it is just a few lines in noncritical section (you can see the differences if you make two patched version, 4027 only and 904 only).
DMA burst patch is not required, ICU patches (dts, irq.c and smp-mt) should be usefull, but system should build without them. I did used all of them.
Well yes and no. When you asked this I tried all the ports and then on first try Port 1 gave this:
[ 2505.807876] lantiq,xrx200-net 1e108000.eth eth0: not enough TX ring space for frags
After I tried to do iperf test again it gave this kernel panic on Port 1:
root@OpenWrt:/# iperf -c 192.168.1.196
[ 2573.313664] ------------[ cut here ]------------
[ 2573.316914] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x21c/0x3f8
[ 2573.325178] NETDEV WATCHDOG: eth0 (lantiq,xrx200-net): transmit queue 0 timed out
[ 2573.332629] Modules linked in: ath9k ath9k_common ath9k_hw ath pppoe nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD pppox ppp_async owl_loader nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack ltq_deu_vr9 iptable_mangle iptable_filter ip_tables crc_ccitt compat drv_dsl_cpe_api ledtrig_usbport drv_mei_cpe ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm drv_ifxos dwc2 gpio_button_hotplug
[ 2573.402320] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.98 #0
[ 2573.408244] Stack : 00000000 00010000 00000001 8007c528 80650000 80600320 00000000 00000000
[ 2573.416598] 805c6ce4 83811dac 838340ac 8063a687 805c1c10 00000001 83811d50 74ab3c7f
[ 2573.424952] 00000000 00000000 807a0000 00010000 00000000 00000000 00000007 00000000
[ 2573.433308] 00000117 55000000 00000116 00000000 00000000 80650000 00000000 803c9af8
[ 2573.441665] 00000009 00000140 80637f94 00010000 00000003 00000000 00000004 80790004
[ 2573.450019] ...
[ 2573.452451] Call Trace:
[ 2573.454919] [<800115e4>] show_stack+0x58/0x100
[ 2573.459371] [<804b58f4>] dump_stack+0xe4/0x120
[ 2573.463817] [<80033b00>] __warn+0xe0/0x114
[ 2573.467892] [<80033b64>] warn_slowpath_fmt+0x30/0x3c
[ 2573.472854] [<803c9af8>] dev_watchdog+0x21c/0x3f8
[ 2573.477583] [<80095648>] call_timer_fn.isra.3+0x24/0x84
[ 2573.482780] [<800958f8>] run_timer_softirq+0x250/0x31c
[ 2573.487928] [<804d4d90>] __do_softirq+0x128/0x2ec
[ 2573.492618] [<80038b20>] irq_exit+0xac/0xc8
[ 2573.496797] [<80274d0c>] plat_irq_dispatch+0xfc/0x138
[ 2573.501849] [<8000bb08>] except_vec_vi_end+0xb8/0xc4
[ 2573.506799] [<8000d4d0>] r4k_wait_irqoff+0x1c/0x24
[ 2573.511592] [<80071d8c>] do_idle+0xd4/0x154
[ 2573.515763] [<80072004>] cpu_startup_entry+0x24/0x30
[ 2573.520726] [<80038a58>] irq_enter+0x58/0x74
[ 2573.525025] ---[ end trace 23ebd19a3a343560 ]---
[ 2573.529608] eth0: transmit timed out!
connect failed: Host is unreachable
So it seems there is a bug possibly and it can be solved, right? On my 2nd try everything works. Even on Port 1 and I think I should test it while attached to my HH5A on 1gbit/s link. Maybe it will shed some more light on what is going on.
Edit: Also it seems tx ring is getting full for some reason. Maybe its too low.
root@OpenWrt:/# iperf -c 192.168.1.196 -t 60
------------------------------------------------------------
Client connecting to 192.168.1.196, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.254 port 40052 connected with 192.168.1.196 port 5001
[ 3365.915243] tx ring full after send
[ 3365.944203] tx ring full after send
[ 3367.033160] tx ring full after send
[ 3367.102508] tx ring full after send
[ 3368.671618] tx ring full after send
[ 3368.734038] tx ring full after send
[ 3368.904054] tx ring full after send
[ 3369.025805] tx ring full after send
[ 3369.223867] tx ring full after send
[ 3369.451556] tx ring full after send
[ 3369.518576] tx ring full after send
[ 3370.098506] tx ring full after send
[ 3370.136027] tx ring full after send
[ 3370.187110] tx ring full after send
[ 3370.210587] tx ring full after send
[ 3370.278975] tx ring full after send
[ 3370.301863] tx ring full after send
[ 3370.376010] tx ring full after send
[ 3370.380904] tx ring full after send
[ 3381.090977] lantiq,xrx200-net 1e108000.eth eth0: not enough TX ring space for frags
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-73.8 sec 455 MBytes 51.7 Mbits/sec
root@OpenWrt:/#
root@OpenWrt:/# [ 3425.233724] eth0: transmit timed out!
I will run some more tests tomorrow to see how it behaves but so far no load on router even when going with full 100mbit/s throughput.
Yeah that's the problem with my 1Gbit ports too. I didn't figure yet why, but it is either I don't understand how to limit tx xmit function when the ring is growing or the system is too slow to be able to deallocate already transmitted packets in tx housekeeping function.
The error is just the warning the kernel network stack timeouted, unless there is the setting for panic on every warning, it will just continue.
But thanks for testing.
Do you think something like this is possible for lantiq based devices? I think those are available for mips24kc but only for ar71xx.
I think probably not. If there are general optimisations of the network stack, they will probably end up in vanilla kernel and the hardware offloading thinks in SoC will be most likely implemented in a different way. I don't even know if there is some hardware (for example NAT offloading). There is no public manual for xrx200.
I did some more tests with wifi and it seems the throughput has actually improved. Not to mention that device is not routing any traffic though, it's just connected through LAN from the main router.
Tests before the patches:
DL: 35-40 mbit/s
UL: 40-45 mbit/s
CPU Load: ~1.8
After the patches:
DL: 50-105mbit/s
UL: 50-105mbit/s
Avg: 60-75mbit/s roughly
CPU Load: ~1.0
If testing both ways simultaneously the DL and UL is around 25mbit/s and 35mbit/s with load around 1.8 and more.
But I think there's still room for more. If you look the current /proc/interrupts
:
root@OpenWrt:~# cat /proc/interrupts
CPU0 CPU1
7: 1252052 1190227 MIPS 7 timer
8: 18828 39540 MIPS 0 IPI call
9: 355149 113465 MIPS 1 IPI resched
22: 1 129928 icu 22 spi_rx
23: 79176 0 icu 23 spi_tx
24: 0 0 icu 24 spi_err
62: 0 0 icu 62 1e101000.usb, dwc2_hsotg:usb1
63: 0 45560 icu 63 mei_cpe
72: 18111 0 icu 72 vrx200_rx
73: 0 16712 icu 73 vrx200_tx
91: 0 0 icu 91 1e106000.usb, dwc2_hsotg:usb2
112: 0 215 icu 112 asc_tx
113: 0 0 icu 113 asc_rx
114: 0 0 icu 114 asc_err
126: 0 0 icu 126 gptu
127: 0 0 icu 127 gptu
128: 0 0 icu 128 gptu
129: 0 0 icu 129 gptu
130: 0 0 icu 130 gptu
131: 0 0 icu 131 gptu
144: 23871 1752135 icu 144 ath9k
161: 0 0 icu 161 ifx_pcie_rc0
ERR: 1
ath9k
is the WiFi chip and it's utilizing more of the 2nd core than the first one. I also managed to see the CPU activity through htop
and 2nd CPU was being utilized more. Although I think the average utilization was around 30-40%. If the Wi-Fi can utilize more CPU cycles it can actually provide more than 100mbit/s DL and UL. I am not sure how would it do that but theoretically it should be possible.
Are you using irqbalance ?
Yes that would be right. I put all of your patches in the 4.14-patches folder.
Its maybe a bit offtopic, but someone know how stable it is now on vdsl2 and what performance am i expecting if im building my own modem only (or bridge mode) image? Last time i tryed a year ago with my 3370 it was kind of disappointing. I was not able to keep the vdsl2 line synced more than a few minutes to hours with the proprietary vr9 driver. And the performance on the vdsl2 line wasnt that great either.
Any suggestions are welcome
Well this is pretty much off topic but it actually depends on your line. If your line is in bad condition it may not work as you expect it to so if you experience any problems I'd suggest to use a separate modem in bridge mode.
I tested the working patch created from files from post #1 + 904 and 905 on O2-Box 6431.
This box have only 4 100Mbit/s ports + 1 100Mbit/s port behind the gray DSL-port.
The result like the same:
I have to devices on it and do a speedtest:
They starts fast and them the connecting breaks down, after a while the connection come up.
It works if you do not use too much data.
root@OpenWrt:~# cat /proc/interrupts
CPU0 CPU1
0: 6126 29766 MIPS 0 IPI_resched
1: 2135 1633 MIPS 1 IPI_call
7: 915250 923515 MIPS 7 timer
8: 0 0 MIPS 0 IPI call
9: 0 0 MIPS 1 IPI resched
62: 0 0 icu 62 1e101000.usb, dwc2_hsotg:usb1
63: 183466 0 icu 63 mei_cpe
72: 0 23343 icu 72 vrx200_rx
73: 0 35973 icu 73 vrx200_tx
96: 60424 0 icu 96 ptm_mailbox_isr
112: 0 263 icu 112 asc_tx
113: 0 0 icu 113 asc_rx
114: 0 0 icu 114 asc_err
126: 0 0 icu 126 gptu
127: 0 0 icu 127 gptu
128: 0 0 icu 128 gptu
129: 0 0 icu 129 gptu
130: 0 0 icu 130 gptu
131: 0 0 icu 131 gptu
ERR: 0
(Hmm, no vrx200_tx_2 inside /proc/interrupts)
@ahmar16 with regard to fastpath: It called "PPE" (Protokoll Processor Engine)
read post #4
In my hope it is possible to create an kernelmodule from an stockfw source and use them.
But I didn't worry about it anymore.
So, I use a by jomehub 5a on a Telekom vdsl2 50/10, Line resold by Telefonica/O2. I routinely see sync-uptime in multiple months (basically no unenforced resyncs). This is as bridged-modem only, as the box was overtaxed with adding NAT, traffic shaping and wifi processing on top of the vdsl-modem duty....
One caveat is that my line is still not using vectoring, so I have no information about stability in that increasingly likely scenario....
Upstream Downstream
Current Rate (Kbps) 2239 24125
Max Rate (Kbps) 39997 58892
SNR Margin (dB) 26.6 20
Line Attenuation (dB) 17.3 15.1
Errors (Pkts) 0 215
(I hope the formating is OK) ... the log from TD-W9980B(DE), stock firmware. Stability is fine, connection (22/2 Mbps) works all the day. I used openwrt firmware for about a day and it seems to be stable (I was testing different firmwares and some very very unstable, but the one directly from openwrt was OK).
Hmm irqbalance should (in theory) use the most optimal distribution. BTW even if the distribution is 1:1 it doesn't mean the load will be 1:1 too. RX interrupt have much different load than TX interrupt (at least in ethernet).
Yeah "ring is full" and "tx timeout", I'm working on it right now (most lags are already fixed). It seems SoC/kernel is too slow to actually deallocate transmitted DMA descriptors, so new packets must wait on it and they will timeout. It would be interesting to try increase system tick speed (openwrt is using 250 Hz, which is IMO pretty low for 1Gbit ethernet). BTW the ring full infos are just informative, you can decrease the event if you put RX and TX interrupt on differet core.
[ 2780.742945] lantiq,xrx200-net 1e108000.eth eth0: port 5 got link
[ 2782.790849] lantiq,xrx200-net 1e108000.eth eth0: port 5 lost link
This is weird, I didn't change the parts of the driver which control link. That must be something else.
It wasn't used on my modem at all. In a discussion on the mailing list, there are plans to use it in the future (it is reserved for direct DSL pipe right now, but it could be potentially used for 2 TX queues for 2 VPEs).