I have a NanoPi R5C running the latest OpenWrt snapshot. This device has a quad core Cortex A55 processor and I'm trying to balance the load when running SQM. My SQM settings are eth1 - 375000/375000 - cake - piece_of_cake.qos
Here are the default settings for this device
root@OpenWrt:~# grep eth /proc/interrupts
73: 0 4191499 0 0 MSI 134742016 Edge eth0
74: 0 0 4221050 0 MSI 268959744 Edge eth1
root@OpenWrt:~# cat /proc/irq/73/smp_affinity
2
root@OpenWrt:~# cat /proc/irq/74/smp_affinity
4
root@OpenWrt:~# cat /sys/class/net/eth0/queues/rx-0/rps_cpus
0
root@OpenWrt:~# cat /sys/class/net/eth1/queues/rx-0/rps_cpus
0
- Default settings test
I use the Waveform bufferbloat test while monitoring the CPU usage with htop.
CPI 0: 0%
CPU 1: 65%
CPU 2: 100%
CPU 3: 0%
There is no load at all on CPU 0 and 3. I am not sure what the value of 0 means for /sys/class/net/eth1/queues/rx-0/rps_cpus
. Which CPU core does it use?
- Pin each on different cores
I applied the following change
root@OpenWrt:~# echo 1 > /sys/class/net/eth1/queues/rx-0/rps_cpus
root@OpenWrt:~# echo 8 > /sys/class/net/eth0/queues/rx-0/rps_cpus
CPI 0: 90%
CPU 1: 23%
CPU 2: 30%
CPU 3: 50%
All cores now have load, but during the download speed test the speed fluctuates quite a bit and sometimes dips below 200 Mbps. This did not happen with default settings
- Pin queues on core 0 and core 3
root@OpenWrt:~# echo 9 > /sys/class/net/eth0/queues/rx-0/rps_cpus
root@OpenWrt:~# echo 9 > /sys/class/net/eth1/queues/rx-0/rps_cpus
CPI 0: 85%
CPU 1: 23%
CPU 2: 30%
CPU 3: 50%
All cores have load and the download speeds are more stable
- Pin queues to all cores
root@OpenWrt:~# echo F > /sys/class/net/eth0/queues/rx-0/rps_cpus
root@OpenWrt:~# echo F > /sys/class/net/eth1/queues/rx-0/rps_cpus
CPI 0: 40%
CPU 1: 75%
CPU 2: 75%
CPU 3: 40%
All cores have load and the download speeds are more stable.
Based on my tests I have some questions:
- What does the value of 0 mean for
/queues/rx-0/rps_cpus
? - Even though the load was more balanced in the last configuration the latency under load was similar to stock when only 2 cores were used. Why is this the case since they are the same. Does the core clock go down with more cores being used?
- I thought SQM was single threaded, I was surprised to see the utilization spread more evenly with the different settings
- What would be the best config to use here? I am not planning on running many services on the router so routing and SQM will be the highest utilization.
Thanks!