Adding OpenWrt support for Xiaomi AX3600

Hm, that is really weird.
Can you check with ethtool -k whether GRO is enabled?
You can also try disabling it with ethtool -K gro off
Its per port off course

Let me rebuild with ethtool and get the output.

Brought the PPoE test server up again, and for me it helps.
GRO on:

[robimarko@localhost ~]$ iperf3 -c 192.168.2.32
Connecting to host 192.168.2.32, port 5201
[  5] local 192.168.1.218 port 55832 connected to 192.168.2.32 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   103 MBytes   863 Mbits/sec    0    776 KBytes       
[  5]   1.00-2.00   sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
[  5]   2.00-3.00   sec   102 MBytes   860 Mbits/sec    0    776 KBytes       
[  5]   3.00-4.00   sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
[  5]   4.00-5.00   sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
[  5]   5.00-6.00   sec   102 MBytes   860 Mbits/sec    0    776 KBytes       
[  5]   6.00-7.00   sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
[  5]   7.00-8.00   sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
[  5]   8.00-9.00   sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
[  5]   9.00-10.00  sec   101 MBytes   849 Mbits/sec    0    776 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1017 MBytes   853 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1014 MBytes   850 Mbits/sec                  receiver

iperf Done.
[robimarko@localhost ~]$ iperf3 -c 192.168.2.32 -R
Connecting to host 192.168.2.32, port 5201
Reverse mode, remote host 192.168.2.32 is sending
[  5] local 192.168.1.218 port 45302 connected to 192.168.2.32 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  83.9 MBytes   704 Mbits/sec                  
[  5]   1.00-2.00   sec  83.9 MBytes   704 Mbits/sec                  
[  5]   2.00-3.00   sec  84.0 MBytes   704 Mbits/sec                  
[  5]   3.00-4.00   sec  83.9 MBytes   704 Mbits/sec                  
[  5]   4.00-5.00   sec  83.8 MBytes   703 Mbits/sec                  
[  5]   5.00-6.00   sec  83.7 MBytes   702 Mbits/sec                  
[  5]   6.00-7.00   sec  83.5 MBytes   701 Mbits/sec                  
[  5]   7.00-8.00   sec  83.5 MBytes   701 Mbits/sec                  
[  5]   8.00-9.00   sec  83.7 MBytes   702 Mbits/sec                  
[  5]   9.00-10.00  sec  82.9 MBytes   695 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   840 MBytes   704 Mbits/sec   51             sender
[  5]   0.00-10.00  sec   837 MBytes   702 Mbits/sec                  receiver

iperf Done.

GRO off:

[robimarko@localhost ~]$ iperf3 -c 192.168.2.32
Connecting to host 192.168.2.32, port 5201
[  5] local 192.168.1.218 port 38246 connected to 192.168.2.32 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  79.5 MBytes   666 Mbits/sec    0    616 KBytes       
[  5]   1.00-2.00   sec  79.2 MBytes   664 Mbits/sec    0    716 KBytes       
[  5]   2.00-3.00   sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
[  5]   3.00-4.00   sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
[  5]   4.00-5.00   sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
[  5]   5.00-6.00   sec  80.0 MBytes   671 Mbits/sec    0    716 KBytes       
[  5]   6.00-7.00   sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
[  5]   7.00-8.00   sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
[  5]   8.00-9.00   sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
[  5]   9.00-10.00  sec  78.8 MBytes   661 Mbits/sec    0    716 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   790 MBytes   663 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   787 MBytes   660 Mbits/sec                  receiver

iperf Done.
[robimarko@localhost ~]$ iperf3 -c 192.168.2.32 -R
Connecting to host 192.168.2.32, port 5201
Reverse mode, remote host 192.168.2.32 is sending
[  5] local 192.168.1.218 port 45970 connected to 192.168.2.32 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  84.5 MBytes   709 Mbits/sec                  
[  5]   1.00-2.00   sec  84.5 MBytes   709 Mbits/sec                  
[  5]   2.00-3.00   sec  84.5 MBytes   709 Mbits/sec                  
[  5]   3.00-4.00   sec  84.6 MBytes   709 Mbits/sec                  
[  5]   4.00-5.00   sec  84.1 MBytes   706 Mbits/sec                  
[  5]   5.00-6.00   sec  83.8 MBytes   703 Mbits/sec                  
[  5]   6.00-7.00   sec  83.6 MBytes   702 Mbits/sec                  
[  5]   7.00-8.00   sec  83.6 MBytes   701 Mbits/sec                  
[  5]   8.00-9.00   sec  83.6 MBytes   702 Mbits/sec                  
[  5]   9.00-10.00  sec  83.7 MBytes   702 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   843 MBytes   707 Mbits/sec   30             sender
[  5]   0.00-10.00  sec   840 MBytes   705 Mbits/sec                  receiver

iperf Done.

generic-receive-offload: on

I assume this means it is engaged. Checked all 4 ports, all is the same.

Yet, no measurable diff on speedtest, I am getting the same 500Mbits DL with Core0 at 100%.

Hm, do you maybe do VLAN tagging?
Cause, that is the only thing I dont do

No VLANs are used neither on the WAN side or on the LAN side.

Although when I try to turn it off:

root@XAX6:~# ethtool -K gro off eth0
ethtool (-K): flag 'eth0' for parameter '(null)' is not followed by 'on' or 'off'

ethtool -K eth0 gro off

That worked. Turning it off on all ports yields the same result. Remember this is with packet steering off, flow offload off, no IRQ opts engaged.

I am trying to think of why it works for me both without and with PPoE, with PPoE clearly showing lower speeds but still way higher with GRO enabled.

I am also not using anything, just default config and that's it.

Can you go into feeds/nss_packages and check with git log that GRO commit is there?
You cleaned the nss-dp package or whole build after updating the feed?

commit 76602b9c184fb4f7e20126d0cfc0c999fc9cbce0 (HEAD -> main, origin/main, origin/HEAD)
Author: Robert Marko <robimarko@gmail.com>
Date:   Thu Jun 23 14:28:10 2022 +0200

    nss-dp: edma-v1: switch to napi_gro_receive

    Utilize napi_gro_receive instead of plain netif_receive_skb on EDMA v1.
    It provides significant performance improvements when testing with iperf3.

    Signed-off-by: Robert Marko <robimarko@gmail.com>

I did what I usually do: git reset hard to master, then delete the temp folder, make menuconfig, make clean, then make -j4.

Between which two ports you are testing? And are you sure everything is really off, like flo offload or PS?

It's between eth0 and eth3, so WAN and one of the LAN ports.
I have my desktop running iperf3 which is behind the WAN port and a notebook on the LAN port to make sure it has to route the traffic and pass NAT with firewalling.

AFAIK no kind of offloading is enabled by default, I didn't enable it for sure.

Only thing I have is this due to IPsec:

image

And an IP rule. But none of those are affecting the normal outbound LAN traffic...

MOD: re-enabling, flow offload, packet steering and IRQ mods now yields not omre than 900Mbits DL...

Well, I am out of ideas.

Any 3rd party willing to do some speed testing?

I'm testing eth0/eth1 no ppoe, speed 890-940 mbit/s, the ethtool -K eth0/1 gro off/on has no effect for the speed. CPU0 100%

Matti

are we sure this is not just a limitation of the client? i mean i can see everyone is reaching gigabit speed

Well, normally I can get 930-940Mbits with all the optimizations enabled. Without them I am limited to 480-510Mbits. Based on Robi's result there is al almost 200Mbit speed diff even with PPPoE, so I should be able to measure this quite easily. Yet, I can't. Of course I cannot tell with 100% certainty that GPON did not started to congest right now and that is the reason, but between no optimizations (500Mbits) and optimizations enabled (900Mbits+) I can clearly distinguish even right now. Without optimizations + GRO I should see speeds around 700Mbits based on Robi's results, yet I only see the same 500Mbits as I can do without anything enabled.

I trust Robi's results, but I cannot reproduce the same completely isolated environment as this is my live router. but I am pretty sure that at least my DL measurements are completely fine. My uplink is limited to 330Mbits by the provider, so that is fine, I usually compare CPU loads to see if the uplink is improved, and not the speed itself.

MOD: by the way, without mods + GRO ON, even LAN-LAN is maxing out a single core with iperf3:

image

Both directions looks like this (iperf3 is NOT running on the router), gigabit speeds are achieved, but there are retransmissions.

There are retransmissions as due to the shitty QCA driver that has 0 offloads, its all done in SW and only thanks to the fast CPU speeds are decent but it sometimes is playing catch-up and packets get dropped.

Yep, but if I have everything enabled, even these traffic types are better spread across cores, no single core doing 100%.

Not sure if this is the scenario of interest, but you made me curious about GRO:

AX3600 -(1Gbit)- QSW-2104 -(10Gbit-fiber)- QSW-2104 -(1Gbit)- AX3600

  • One way iperf3 throughput: ~940 Mbit without GRO, cpu0 ~95%
  • One way iperf3 throughput: ~940 Mbit with GRO, cpu0 ~50%
  • Duplex iperf3 throughput ~ 600 Mbit each without GRO, cpu0 ~95%
  • Duplex iperf3 throughput ~ 840 Mbit each with GRO, cpu0 ~80%

GRO is my new friend :slight_smile:

Thanks for testing, its a simple change but really improves the performance.
I have ordered a 10G AQC107 NIC so I am able to actually see the limit as 1G is not enough.

I am eying some more cache-related improvements that wont be as drastic, but will provide couple of % improvement.
Basically, driver is shit, checksum and TSO would really help furhter.

3 Likes