Possible Kernel 5.10 regression issue with MT7621 and SW/HW offload enabled

Ok good news, SW and HW offloading seem to work again, but as advised by @nikito7 I performed a reboot
basic 520 Mbit/s
SW 675 Mbit/s
HW 975 Mbit/s

I will leave the router for a couple of days to test the stability.

1 Like

So a reboot is required after toggling the firewall setting? Is grep OFFLOAD /proc/net/nf_conntrack displaying any results even before the reboot?

I didn't try both before rebooting.
I'm using a R6220 with OpenWrt SNAPSHOT r18781-8d8d26ba42 / LuCI Master git-22.025.79016-22e2bfb
WAN : 1 Gbit/s IPv4 only

With current snapshot don't need reboot

I have just done tests with and without HW offloading. I can confirm it doesn't need reboot to toggle.

2 Likes

Hi all.
21h running.

  • no random reboot
  • SW and HW offloading are working
  • bandwith and CPU usage results are what to be expected.
  • on-the-fly toggling of HW works (no reboot needed).

So far, everything is working as expected :smiley:
I'll keep it running until 24h.

3 Likes

Does HW offload work with PPPoE and/or VLAN?

1 Like

EDIT: Ignore nonsense below and see my post several posts down. Software and hardware offloading are working on my MT7621 ER-X just fine.

I'd say "No." MT7621 offloading still does not work. At least it does not do anything on my ER-X.

The following speed tests on my ER-X are in order:

  1. with no offloading,
  2. with software offloading checked in the LuCI GUI and
  3. with hardware offloading checked.

Average CPU load over all 4 CPU's (threads actually - it's only a dual core) was ~57% in all three tests.

Latency is nothing to write home about with SQM disabled.

If I run CAKE, I cannot get more than about ~100 Mbps download. Without latency degrading, I cannot get more than ~10 Mbps upload with SQM, which seems weird, because the CPU can handle 100 Mbps on the download. But it does drop latency to ~35ms so at least there is that.

 OpenWrt SNAPSHOT, r18785-8072bf3322
 -----------------------------------------------------
root@ER-X:~# speedtest-netperf.sh
2022-02-11 13:55:52 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download: 469.54 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:  25.043
    10pct:  83.626
   Median:  91.570
      Avg:  90.871
    90pct:  98.602
      Max: 118.500
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  76.8 +/-  5.2
     cpu1:  41.2 +/-  4.9
     cpu2:  57.8 +/-  5.9
     cpu3:  52.6 +/-  5.0
 Overhead: [in % used of total CPU available]
  netperf:  40.3
.............................................................
   Upload:  22.43 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  16.432
    10pct:  68.120
   Median: 119.730
      Avg: 121.320
    90pct: 163.118
      Max: 237.840
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:   5.8 +/-  2.4
     cpu1:   7.6 +/-  3.7
     cpu2:   0.8 +/-  0.9
     cpu3:   6.6 +/-  3.3
 Overhead: [in % used of total CPU available]
  netperf:   0.9
root@ER-X:~# speedtest-netperf.sh
2022-02-11 13:58:43 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download: 473.15 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:  13.494
    10pct:  64.737
   Median:  78.561
      Avg:  76.163
    90pct:  84.895
      Max:  87.701
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  93.5 +/-  2.6
     cpu1:  77.6 +/-  3.8
     cpu2:  22.6 +/-  9.5
     cpu3:  33.4 +/-  8.3
 Overhead: [in % used of total CPU available]
  netperf:  43.8
.............................................................
   Upload:  23.11 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  24.566
    10pct:  74.462
   Median: 124.533
      Avg: 122.029
    90pct: 159.199
      Max: 202.953
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:   8.6 +/-  2.9
     cpu1:   8.1 +/-  2.9
     cpu2:   0.2 +/-  0.5
     cpu3:   0.2 +/-  0.4
 Overhead: [in % used of total CPU available]
  netperf:   1.0
root@ER-X:~# speedtest-netperf.sh
2022-02-11 14:01:27 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download: 471.98 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:  35.867
    10pct:  83.508
   Median:  90.189
      Avg:  89.929
    90pct:  96.680
      Max: 104.288
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  70.8 +/-  5.6
     cpu1:  38.4 +/-  6.2
     cpu2:  56.6 +/-  5.7
     cpu3:  63.4 +/-  5.4
 Overhead: [in % used of total CPU available]
  netperf:  40.1
.............................................................
   Upload:  23.07 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  30.433
    10pct:  72.117
   Median: 113.173
      Avg: 116.564
    90pct: 159.658
      Max: 197.925
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:   7.7 +/-  3.1
     cpu1:   6.5 +/-  3.1
     cpu2:   2.8 +/-  1.7
     cpu3:   3.7 +/-  2.1
 Overhead: [in % used of total CPU available]
  netperf:   0.9
root@ER-X:~# 

1 Like

Can you please check the output of grep OFFLOAD /proc/net/nf_conntrack during any of these test cases? And check the the flowtable was properly initialized: `nft list flowtables?

Can't tell, I don't use both.

1 Like

Without any offloading checked, no results.

With software offloading enabled, here is a snippet of the grep output:

ipv4     2 tcp      6 src=10.23.40.236 dst=104.16.248.249 sport=53642 dport=443 packets=7805 bytes=920995 src=104.16.248.249 dst=172.x.x.x sport=443 dport=53642 packets=6339 bytes=1847039 [OFFLOAD] mark=0 zone=0 use=3
ipv6     10 udp      17 src=2603:6081:8e00:00a7:x:x:x:x dst=2a03:2880:f02c:010e:face:b00c:0000:0002 sport=41186 dport=443 packets=2 bytes=1413 src=2a03:2880:f02c:010e:face:b00c:0000:0002 dst=2603:6081:8e00:00a7:x:x:x:x sport=443 dport=41186 packets=15 bytes=6877 [OFFLOAD] mark=0 zone=0 use=3
ipv6     10 tcp      6 src=2603:6081:8e00:00a7:x:x:x:x dst=2a03:2880:f011:001e:face:b00c:0000:2825 sport=47030 dport=443 packets=243 bytes=43664 src=2a03:2880:f011:001e:face:b00c:0000:2825 dst=2603:6081:8e00:00a7:x:x:x:x sport=443 dport=47030 packets=316 bytes=42269 [OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=10.23.43.196 dst=52.87.247.190 sport=64091 dport=31006 packets=21 bytes=1259 src=52.87.247.190 dst=172.x.x.x sport=31006 dport=64091 packets=14 bytes=584 [OFFLOAD] mark=0 zone=0 use=3
ipv6     10 udp      17 src=2603:6081:8e00:00a7:x:x:x:x dst=2606:4700:0000:0000:0000:0000:6812:1690 sport=55471 dport=443 packets=61 bytes=9656 src=2606:4700:0000:0000:0000:0000:6812:1690 dst=2603:6081:8e00:00a7:x:x:x:x sport=443 dport=55471 packets=157 bytes=181332 [OFFLOAD] mark=0 zone=0 use=3

and after a test with hardware offloading checked, these are the last few lines of grep output:

ipv4     2 udp      17 src=10.23.43.126 dst=208.83.246.21 sport=35384 dport=53 packets=1 bytes=62 src=208.83.246.21 dst=172.x.x.x sport=53 dport=35384 packets=1 bytes=146 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv6     10 udp      17 src=2603:6081:8e00:00a7:x:x:x:x dst=2607:f8b0:4002:0c08:0000:0000:0000:005f sport=58900 dport=443 packets=2 bytes=2556 src=2607:f8b0:4002:0c08:0000:0000:0000:005f dst=2603:6081:8e00:00a7:x:x:x:x sport=443 dport=58900 packets=22 bytes=6246 [OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=10.23.40.236 dst=35.186.227.140 sport=46182 dport=443 packets=1 bytes=60 src=35.186.227.140 dst=172.x.x.x sport=443 dport=46182 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=10.23.40.106 dst=34.107.221.82 sport=37906 dport=80 packets=1 bytes=60 src=34.107.221.82 dst=172.x.x.x sport=80 dport=37906 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv6     10 udp      17 src=2603:6081:8e00:00a7:x:x:x:x dst=2607:f8b0:4002:0c06:0000:0000:0000:0063 sport=59571 dport=443 packets=2 bytes=1519 src=2607:f8b0:4002:0c06:0000:0000:0000:0063 dst=2603:6081:8e00:00a7:x:x:x:x sport=443 dport=59571 packets=13 bytes=10655 [OFFLOAD] mark=0 zone=0 use=3

The version of LuCI I', using is: LuCI Master git-22.025.79016-22e2bfb

So some flows are supposedly getting offloaded, but they all look like forwarded traffic.

Btw, the flowtable does not include lo so I am not surprised that localhost-generated traffic is not offloaded at all. Your console outputs above indicate that you ran those tests on the device?

Correct. Otherwise I'm on WiFi to an AP connected to the device that tops out ~230 Mbps.

I did also run some tests looking at cpu load in htop earlier using iperf3 with the ER-X set to both client and server and running traffic between the ER-X and an AP (an EA8500) to see if changing offload settings had any effect on CPU load - it did not.

With the ER-X as the sever, throughput topped out ~500 Mbps and CPU load was topped out as well (on at least 1-2 threads). In the other direction, throughput was higher using the EA8500 as the server (I don't recall exactly, something like ~800 Mbps) and traffic between two AP's connected through the ER-X went at line rate (~935 Mbps) with no CPU usage to speak of.

OK. I'm a little slow. I thought I was being lazy (I did not want to walk to a wired PC upstairs) and clever (running the speed tests on the device and iperf3 between AP's). Well, it isn't the first time I've been both lazy and ignorant. I've gotten used to it :wink:

I ran a speed test from a wired PC with no offloading, software offloading and hardware offloading while monitoring ER-X CPU usage in htop. I can now confirm hardware offloading works as expected.

CPU usage is pretty near zero while downloading ~470 Mbps if hardware offloading is checked. If just software offloading is checked, CPU usage is ~30% and if no offloading is checked, CPU usage is ~64%.

1 Like

I just did a new build (SNAPSHOT r18792-337e942290 2022-02-11).

I can also confirm that HW Flow Offload is now working with Firewall4 and Kernel 5.10 (monitored with htop while doing a heavy download @ 350Mbps).

I will start monitoring for the next days if the random reboots (that existed with Kernel 5.10 + Firewall3 + HW Flow Offload) are also solved.

3 Likes

After 24h, nothing to report, everything is working fine.

2 Likes

@jow i think it's not your work but have on github pull request

it's is the best for finish all work on offload.

I start test now with kernel 5.10 + nftables.

Yes it does work, but not as good as in OpenWrt 19.07.

Tested with DIR-860L B1 + 1000/300 PPPoE:

  • OpenWrt SNAPSHOT, r18785-8072bf3322 (5.10.96)
  • basic: ~350 Mbit/s download @ 66% sirq / 320 Mbit/s upload @ 88% sirq
  • SW: ~750 Mbit/s download @ 53% sirq / 320 Mbit/s upload @ 48% sirq
  • HW: wire speed (920+ Mbit/s) @ 23% sirq / 320 Mbit/s upload @ 33% sirq
  • no need to reboot after applying changes (as @nikito7 already stated)

OpenWrt 19.07 uses 0% sirq if hardware offload is enabled.


nft list flowtables:

table inet fw4 {
	flowtable ft {
		hook ingress priority filter
		devices = { lan1, lan2, lan3, lan4 }
		flags offload
	}
}

grep OFFLOAD /proc/net/nf_conntrack|tail:

ipv4     2 tcp      6 src=192.168.x.y dst=185.72.16.27 sport=48234 dport=8080 packets=1 bytes=60 src=185.72.16.27 dst=87.97.93.137 sport=8080 dport=48234 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168..x.z dst=13.49.168.130 sport=55408 dport=8008 packets=275 bytes=22771 src=13.49.168.130 dst=87.97.93.137 sport=8008 dport=55408 packets=1779 bytes=450556 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.x.y dst=142.250.27.108 sport=51730 dport=993 packets=80 bytes=4552 src=142.250.27.108 dst=87.97.93.137 sport=993 dport=51730 packets=39 bytes=2726 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.x.y dst=185.72.16.27 sport=48230 dport=8080 packets=1 bytes=60 src=185.72.16.27 dst=87.97.93.137 sport=8080 dport=48230 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.x.y dst=185.72.16.27 sport=48214 dport=8080 packets=1 bytes=60 src=185.72.16.27 dst=87.97.93.137 sport=8080 dport=48214 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.x.y dst=185.72.16.27 sport=48218 dport=8080 packets=1 bytes=60 src=185.72.16.27 dst=87.97.93.137 sport=8080 dport=48218 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.x.y dst=3.126.186.102 sport=34044 dport=443 packets=97 bytes=10694 src=3.126.186.102 dst=87.97.93.137 sport=443 dport=34044 packets=757 bytes=139469 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.x.y dst=139.59.210.197 sport=33300 dport=443 packets=1 bytes=60 src=139.59.210.197 dst=87.97.93.137 sport=443 dport=33300 packets=1599 bytes=852777 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168..x.z dst=51.195.89.38 sport=40984 dport=12020 packets=1 bytes=60 src=51.195.89.38 dst=87.97.93.137 sport=12020 dport=40984 packets=1077 bytes=333683 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168..x.z dst=62.4.9.11 sport=58402 dport=80 packets=3 bytes=164 src=62.4.9.11 dst=87.97.93.137 sport=80 dport=58402 packets=2 bytes=112 [OFFLOAD] mark=0 zone=0 use=3
1 Like

Tested with Xiaomi R3G v1 + 1000/200 IPoE:

OpenWrt SNAPSHOT r18792-337e942290 (5.10.96)

  • basic: ~555 Mbit/s download @ 60% sirq / 218mb upload @ 22% sirq
  • SW: ~830 Mbit/s download @ 54% sirq / 218mb upload @ 15% sirq
  • HW: ~900 Mbit/s download @ 39% sirq / 218mb upload @ 16% sirq

I set HW and reboot. After reboot

  • HW: ~920 Mbit/s download @ 0% sirq / 218mb upload @ 0% sirq
1 Like

:thinking:

Looks like your build is newer than mine:

  • mine: r18785-8072bf3322
  • yours: r18792-337e942290
  • commits: 8072bf3322...337e942290
    • and I don't see any relevant commit in that range

And yes, rebooted and retested, and got the same not 0% sirq.


Ohh, got it!
My ISP uses "IP packets encapsulated in PPP, which is in turn encapsulated in Ethernet" aka PPPoE. And not IPoE!

So PPPoE is still not fully hardware offloaded (as it was in 19.07).