well then, that makes this extra weird.
I did a bit more testing and it seems that the combination of having GRO disabled and SQM enabled is causing traffic whose source or destination is the router itself to go through the CPU not the NSS. I'll reiterate that this isnt a problem for LAN-side devices, just traffic directly to/from the router.
My "test" for whether traffic is hitting the CPU or going through the NSS is to run speedtest-netperf.sh
and compare the average CPU usage to the netperf overhead. If they are roughly equal (typically avg cpu is slightly lower) then it means that traffic is going through the NSS. If the average CPU utilization is much higher than the netperf overhead (meaning that running the netperf test is eating up more CPU time than what is required to run netperf) then the traffic is hitting the CPU.
`speedtest-netperf.sh` results
GRO: yes SQM: no
Download: 885.05 Mbps
Latency: [in msec, 31 pings, 0.00% packet loss]
Min: 17.500
10pct: 17.600
Median: 20.400
Avg: 22.216
90pct: 28.400
Max: 38.000
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 28 samples]
cpu0: 15.9 +/- 1.5 @ 2208 MHz
cpu1: 12.2 +/- 1.2 @ 2208 MHz
cpu2: 13.8 +/- 2.0 @ 2208 MHz
cpu3: 19.6 +/- 2.5 @ 2208 MHz
Overhead: [in % used of total CPU available]
netperf: 16.5
avg cpu: 15.4
GRO: yes SQM: yes
Download: 869.39 Mbps
Latency: [in msec, 31 pings, 0.00% packet loss]
Min: 17.600
10pct: 17.700
Median: 18.600
Avg: 19.435
90pct: 20.600
Max: 28.700
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 28 samples]
cpu0: 23.1 +/- 5.3 @ 2208 MHz
cpu1: 11.5 +/- 3.6 @ 2208 MHz
cpu2: 12.2 +/- 3.9 @ 2208 MHz
cpu3: 35.4 +/- 0.0 @ 2208 MHz
Overhead: [in % used of total CPU available]
netperf: 23.5
avg cpu: 20.6
GRO: no SQM: no
Download: 897.08 Mbps
Latency: [in msec, 31 pings, 0.00% packet loss]
Min: 17.700
10pct: 18.400
Median: 22.100
Avg: 23.342
90pct: 26.700
Max: 44.200
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 28 samples]
cpu0: 37.1 +/- 7.9 @ 2208 MHz
cpu1: 22.1 +/- 2.2 @ 2208 MHz
cpu2: 20.2 +/- 2.4 @ 2208 MHz
cpu3: 29.8 +/- 2.4 @ 2208 MHz
Overhead: [in % used of total CPU available]
netperf: 28.0
avg cpu: 27.3
GRO: no SQM: yes
Download: 835.79 Mbps
Latency: [in msec, 31 pings, 0.00% packet loss]
Min: 17.500
10pct: 17.700
Median: 18.900
Avg: 20.416
90pct: 21.600
Max: 49.100
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 28 samples]
cpu0: 78.8 +/- 0.0 @ 2208 MHz
cpu1: 39.3 +/- 6.5 @ 2208 MHz
cpu2: 32.1 +/- 6.4 @ 2208 MHz
cpu3: 46.1 +/- 8.8 @ 2208 MHz
Overhead: [in % used of total CPU available]
netperf: 33.3
avg cpu: 49.1
In the first 3 the netperf overhead is accounting for all the CPU usage. Granted, enabling SQM increases the netperf overhead by ~5% and disabling GRO increases it by 15%, but cpu usage increases basically the same amount. Its only with GRO disabled and SQM enabled that the traffic itself is causing cpu usage to increase.
Note that i see this with the much more lightweight speedtest
(from speedtest-cpp)...the first 3 cases cpu usage is basically 0, the last one i see significant cpu usage spikes. It is just harder to quantify that test.
Also note that that issue is minor enough that I could definitely live with it...Im mostly reporting it since it seems like whatever is causing this might be causing other undiscovered and/or hard-to-diagnose issues (e.g., losing WAN access after several hours when GRO is enabled).