High Load average on a MT7621AT

For some strange reason. The load average of my DIR-878 using OpenWRT 23.05.03 scales up to +4.00 load average sometimes and I have no idea of what the problem could be.
What is more weird is the fact that I don't experience any slow speeds nor high latency. Is just a spike in a very short time and it can go up to ~14.00 load average.

Mem: 82020K used, 38484K free, 1204K shrd, 0K buff, 11316K cached
CPU:   1% usr   8% sys   0% nic  74% idle   0% io   0% irq  15% sirq
Load average: 7.82 4.37 4.43 3/363 8150
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 8150 31363 root     R     1396   1%   9% top
  261     2 root     SW       0   0%   7% [napi/mtk_eth-6]
  260     2 root     SW       0   0%   3% [napi/mtk_eth-5]
28550     2 root     IW       0   0%   3% [kworker/2:0-eve]
  637     2 root     SW       0   0%   1% [mt76-tx phy0]
30258     2 root     IW       0   0%   1% [kworker/u8:175-]
...

I use Software + Hardware offloading. I disabled the Enable SYN-flood protection and Drop invalid packets options in the firewall just for testing, they don't affect load average. Using BCP38 doesn't affect load average either. Disabling miniupnpd-nftables doesn't change anything either

I haven't touched the firewall configuration too much, is almost like default

$ cat /etc/config/firewall 
config defaults
	option input 'DROP'
	option output 'ACCEPT'
	option forward 'DROP'
	option flow_offloading '1'
	option flow_offloading_hw '1'
	option synflood_protect '1'
	option drop_invalid '1'
...

(I have some port forwards bellow)

Is there a way to debug this? Is this normal? Because it doesn't seem very normal to me. Any suggestions are appreciated

You came to right place.
Lets establish baseline - post result of following with all offloads disabled

ubus call system board

And install htop
Run it F2-Setup - there unhide kernel threads, enable CPU detail, then press Ctrl-S at the heat of tests and post a screenshot. Ctrl-Q to unfreeze.

1 Like

Result: https://www.waveform.com/tools/bufferbloat?test-id=40cd8e47-0d42-4145-a3ee-66ee79c111bd

$ ubus call system board
{
	"kernel": "5.15.150",
	"hostname": "buh",
	"system": "MediaTek MT7621 ver:1 eco:3",
	"model": "D-Link DIR-878 A1",
	"board_name": "dlink,dir-878-a1",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "23.05.3",
		"revision": "r23809-234f1a2efa",
		"target": "ramips/mt7621",
		"description": "OpenWrt 23.05.3 r23809-234f1a2efa"
	}
}

Software and Hardware Offload disabled
(By the way. Before doing this test with Offload disabled I saw a spike of a kernel thread hitting 150% cpu usage increasing the load average, sadly I was not able to idenitfy which thread it was.)

Try to make single window screenshot with htop.
Now enable soft offload and link+htop picture (should be 2-3x less pink softirq band) or if you use pppoe to connect well need to patch firewall for future version.

150% is OK, CPUs are hyperthreaded in pairs.

Only software offload enabled: https://www.waveform.com/tools/bufferbloat?test-id=183027fe-527d-48e8-9b38-7ae94e4ccfdc
softirq seems to be the same.

if you use pppoe to connect well need to patch firewall for future version.

Yes, I use PPPoE and If enable hardware offloading I get 700/700 speeds (which is it what I paid for)

Place MSS fix in correct place by adding this as /etc/nftables.d/mss.nft

chain mangle_postrouting {
                 type filter hook postrouting priority mangle; policy accept;
               oif $wan_devices tcp flags syn / syn,fin,rst tcp option maxseg size set rt mtu
 }

Redo soft offload test (edit: apply via fw4 check -> if ok service firewall restart)

1 Like

Firewall rule is applied

What I noticed is there is no high average load after disabling Hardware Offloading but this is not something I want to disable since I want to make full use of my bandwidth

Install later fw4 file to limit soft offload to only forwarding interfaces:

next kernel version in next openwrt release will fix pppoe offload which was broken since v22.

How do I install that file to test it?

next kernel version in next openwrt release will fix pppoe offload which was broken since v22.

I guess I can make my own OpenWRT firmware using the next kernel right? Which kernel it is? The next LTS?

You can try 23.05-snapshot built via imagebuilder.
Install file: top right link "raw" file - download it, copy to your device replacing /usr/share/ucode/fw4.uc (it is latest two commits in that repo) , original is in /rom/usr/share/ucode (no need to backup)

I installed the file replacing the existent one, restarted the firewall using service firewall restart and ran the Bufferbloat test again. There is no significant changes compared to the previous tests. I skipped a step?

:: https://www.waveform.com/tools/bufferbloat?test-id=c44904c6-5bc0-449f-b190-012ca644c7ce

restart firewall service and check nft list ruleset that offload group containd br-lan and not ethX and phyX-apY

$ nft list ruleset 
table inet fw4 {
	flowtable ft {
		hook ingress priority filter
		devices = { br-lan, wan }
		counter
	}

	chain mangle_postrouting {
		type filter hook postrouting priority mangle; policy accept;
		oif "pppoe-wan" tcp flags syn / fin,syn,rst tcp option maxseg size set rt mtu
	}
...

I don't see anything related to offload. There is more rules below but those are the default rules + port forwarding ones.

Flowtable is the offload :wink:

1 Like

It should have only pppoe-wan and br-lan in offloads. I dont understand how wan device is pppoe wan but on the other hand wan device turns out pppoe-wan. It is kind of from same source.

Yeah they use the same interface. So what do I do now? Because the only problem here is Hardware Offloading that randomly makes the CPU spikes increasing average load.

Is updating to the latest snapshot my best bet? And if that doesn't work I should report this issue (if there no other related issues regarding Hardware Offloading and Load average)?

snapshot is a good idea, just replace package set with one from 23.05.3 (customize packags - small arrow) to have luci included. Also back up configuration as it cannot be migrated back.

Do you run sqm?

No SQM. I tried it a while ago but speeds are nowhere near Hardware Offloading ones.

you can try 23.05-SNAPSHOT (not downloadable, you need imagebuilder) I have not verified but the fix is in pipeline if not already applied around the end of thread.

Another way is to exclude buggy code path is to manually edit ruleset to have pppoe-wan and br-lan in flowtable device list.

nft list ruleset > edit.nft
vi edit.nft -> adjust flowtable devices
nft -c -f edit.nft
nft flush ruleset ; nft -f edit.nft