Hm, that is really weird.
Can you check with ethtool -k
whether GRO is enabled?
You can also try disabling it with ethtool -K gro off
Its per port off course
Let me rebuild with ethtool and get the output.
Brought the PPoE test server up again, and for me it helps.
GRO on:
[robimarko@localhost ~]$ iperf3 -c 192.168.2.32
Connecting to host 192.168.2.32, port 5201
[ 5] local 192.168.1.218 port 55832 connected to 192.168.2.32 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 103 MBytes 863 Mbits/sec 0 776 KBytes
[ 5] 1.00-2.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
[ 5] 2.00-3.00 sec 102 MBytes 860 Mbits/sec 0 776 KBytes
[ 5] 3.00-4.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
[ 5] 4.00-5.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
[ 5] 5.00-6.00 sec 102 MBytes 860 Mbits/sec 0 776 KBytes
[ 5] 6.00-7.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
[ 5] 7.00-8.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
[ 5] 8.00-9.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
[ 5] 9.00-10.00 sec 101 MBytes 849 Mbits/sec 0 776 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1017 MBytes 853 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1014 MBytes 850 Mbits/sec receiver
iperf Done.
[robimarko@localhost ~]$ iperf3 -c 192.168.2.32 -R
Connecting to host 192.168.2.32, port 5201
Reverse mode, remote host 192.168.2.32 is sending
[ 5] local 192.168.1.218 port 45302 connected to 192.168.2.32 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 83.9 MBytes 704 Mbits/sec
[ 5] 1.00-2.00 sec 83.9 MBytes 704 Mbits/sec
[ 5] 2.00-3.00 sec 84.0 MBytes 704 Mbits/sec
[ 5] 3.00-4.00 sec 83.9 MBytes 704 Mbits/sec
[ 5] 4.00-5.00 sec 83.8 MBytes 703 Mbits/sec
[ 5] 5.00-6.00 sec 83.7 MBytes 702 Mbits/sec
[ 5] 6.00-7.00 sec 83.5 MBytes 701 Mbits/sec
[ 5] 7.00-8.00 sec 83.5 MBytes 701 Mbits/sec
[ 5] 8.00-9.00 sec 83.7 MBytes 702 Mbits/sec
[ 5] 9.00-10.00 sec 82.9 MBytes 695 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 840 MBytes 704 Mbits/sec 51 sender
[ 5] 0.00-10.00 sec 837 MBytes 702 Mbits/sec receiver
iperf Done.
GRO off:
[robimarko@localhost ~]$ iperf3 -c 192.168.2.32
Connecting to host 192.168.2.32, port 5201
[ 5] local 192.168.1.218 port 38246 connected to 192.168.2.32 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 79.5 MBytes 666 Mbits/sec 0 616 KBytes
[ 5] 1.00-2.00 sec 79.2 MBytes 664 Mbits/sec 0 716 KBytes
[ 5] 2.00-3.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
[ 5] 3.00-4.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
[ 5] 4.00-5.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
[ 5] 5.00-6.00 sec 80.0 MBytes 671 Mbits/sec 0 716 KBytes
[ 5] 6.00-7.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
[ 5] 7.00-8.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
[ 5] 8.00-9.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
[ 5] 9.00-10.00 sec 78.8 MBytes 661 Mbits/sec 0 716 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 790 MBytes 663 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 787 MBytes 660 Mbits/sec receiver
iperf Done.
[robimarko@localhost ~]$ iperf3 -c 192.168.2.32 -R
Connecting to host 192.168.2.32, port 5201
Reverse mode, remote host 192.168.2.32 is sending
[ 5] local 192.168.1.218 port 45970 connected to 192.168.2.32 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 84.5 MBytes 709 Mbits/sec
[ 5] 1.00-2.00 sec 84.5 MBytes 709 Mbits/sec
[ 5] 2.00-3.00 sec 84.5 MBytes 709 Mbits/sec
[ 5] 3.00-4.00 sec 84.6 MBytes 709 Mbits/sec
[ 5] 4.00-5.00 sec 84.1 MBytes 706 Mbits/sec
[ 5] 5.00-6.00 sec 83.8 MBytes 703 Mbits/sec
[ 5] 6.00-7.00 sec 83.6 MBytes 702 Mbits/sec
[ 5] 7.00-8.00 sec 83.6 MBytes 701 Mbits/sec
[ 5] 8.00-9.00 sec 83.6 MBytes 702 Mbits/sec
[ 5] 9.00-10.00 sec 83.7 MBytes 702 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 843 MBytes 707 Mbits/sec 30 sender
[ 5] 0.00-10.00 sec 840 MBytes 705 Mbits/sec receiver
iperf Done.
generic-receive-offload: on
I assume this means it is engaged. Checked all 4 ports, all is the same.
Yet, no measurable diff on speedtest, I am getting the same 500Mbits DL with Core0 at 100%.
Hm, do you maybe do VLAN tagging?
Cause, that is the only thing I dont do
No VLANs are used neither on the WAN side or on the LAN side.
Although when I try to turn it off:
root@XAX6:~# ethtool -K gro off eth0
ethtool (-K): flag 'eth0' for parameter '(null)' is not followed by 'on' or 'off'
ethtool -K eth0 gro off
That worked. Turning it off on all ports yields the same result. Remember this is with packet steering off, flow offload off, no IRQ opts engaged.
I am trying to think of why it works for me both without and with PPoE, with PPoE clearly showing lower speeds but still way higher with GRO enabled.
I am also not using anything, just default config and that's it.
Can you go into feeds/nss_packages
and check with git log
that GRO commit is there?
You cleaned the nss-dp package or whole build after updating the feed?
commit 76602b9c184fb4f7e20126d0cfc0c999fc9cbce0 (HEAD -> main, origin/main, origin/HEAD)
Author: Robert Marko <robimarko@gmail.com>
Date: Thu Jun 23 14:28:10 2022 +0200
nss-dp: edma-v1: switch to napi_gro_receive
Utilize napi_gro_receive instead of plain netif_receive_skb on EDMA v1.
It provides significant performance improvements when testing with iperf3.
Signed-off-by: Robert Marko <robimarko@gmail.com>
I did what I usually do: git reset hard to master, then delete the temp folder, make menuconfig, make clean, then make -j4.
Between which two ports you are testing? And are you sure everything is really off, like flo offload or PS?
It's between eth0 and eth3, so WAN and one of the LAN ports.
I have my desktop running iperf3 which is behind the WAN port and a notebook on the LAN port to make sure it has to route the traffic and pass NAT with firewalling.
AFAIK no kind of offloading is enabled by default, I didn't enable it for sure.
Only thing I have is this due to IPsec:
And an IP rule. But none of those are affecting the normal outbound LAN traffic...
MOD: re-enabling, flow offload, packet steering and IRQ mods now yields not omre than 900Mbits DL...
Well, I am out of ideas.
Any 3rd party willing to do some speed testing?
I'm testing eth0/eth1 no ppoe, speed 890-940 mbit/s, the ethtool -K eth0/1 gro off/on has no effect for the speed. CPU0 100%
Matti
are we sure this is not just a limitation of the client? i mean i can see everyone is reaching gigabit speed
Well, normally I can get 930-940Mbits with all the optimizations enabled. Without them I am limited to 480-510Mbits. Based on Robi's result there is al almost 200Mbit speed diff even with PPPoE, so I should be able to measure this quite easily. Yet, I can't. Of course I cannot tell with 100% certainty that GPON did not started to congest right now and that is the reason, but between no optimizations (500Mbits) and optimizations enabled (900Mbits+) I can clearly distinguish even right now. Without optimizations + GRO I should see speeds around 700Mbits based on Robi's results, yet I only see the same 500Mbits as I can do without anything enabled.
I trust Robi's results, but I cannot reproduce the same completely isolated environment as this is my live router. but I am pretty sure that at least my DL measurements are completely fine. My uplink is limited to 330Mbits by the provider, so that is fine, I usually compare CPU loads to see if the uplink is improved, and not the speed itself.
MOD: by the way, without mods + GRO ON, even LAN-LAN is maxing out a single core with iperf3:
Both directions looks like this (iperf3 is NOT running on the router), gigabit speeds are achieved, but there are retransmissions.
There are retransmissions as due to the shitty QCA driver that has 0 offloads, its all done in SW and only thanks to the fast CPU speeds are decent but it sometimes is playing catch-up and packets get dropped.
Yep, but if I have everything enabled, even these traffic types are better spread across cores, no single core doing 100%.
Not sure if this is the scenario of interest, but you made me curious about GRO:
AX3600 -(1Gbit)- QSW-2104 -(10Gbit-fiber)- QSW-2104 -(1Gbit)- AX3600
- One way iperf3 throughput: ~940 Mbit without GRO, cpu0 ~95%
- One way iperf3 throughput: ~940 Mbit with GRO, cpu0 ~50%
- Duplex iperf3 throughput ~ 600 Mbit each without GRO, cpu0 ~95%
- Duplex iperf3 throughput ~ 840 Mbit each with GRO, cpu0 ~80%
GRO is my new friend
Thanks for testing, its a simple change but really improves the performance.
I have ordered a 10G AQC107 NIC so I am able to actually see the limit as 1G is not enough.
I am eying some more cache-related improvements that wont be as drastic, but will provide couple of % improvement.
Basically, driver is shit, checksum and TSO would really help furhter.