digging around here https://patchwork.ozlabs.org/project/netdev/patch/1436213128.4571.14.camel@edumazet-glaptop2.roam.corp.google.com/
- Is this possible with OpenWRT?
- Would it be beneficial(cake ingress)?
digging around here https://patchwork.ozlabs.org/project/netdev/patch/1436213128.4571.14.camel@edumazet-glaptop2.roam.corp.google.com/
no it's probably not beneficial. If I understand, this is really mostly about handling 10GigE interfaces etc using multiple cores.
Cake needs to see all the packets and is single-threaded anyway.
It wouldn't hurt if you weren't doing QoS but then if you're not doing QoS you probably aren't using an IFB either...
Maybe that's something for the suggestion box lol
I'm now using a QOTOM x86_64 device j1900 Celeron Quad Core...
I've tried moving around the rx/tx queues to different core but still managing to peg at ~90% with sqm.
Yep, the j1900 gets just about to 1Gbps with shaping before it saturates the shaper. Through the magic of moore's law the RPi4 is something like twice as fast and only uses 50% of one core shaping a gig.
Wow. That's pretty nice for an arm processor. I wouldn't expect that kind of performance.
Here's a pieced script I use currently to pin irq's and some other settings.
#!/bin/ash
for n in $(cat /proc/interrupts | awk '/eth1/ { print $1}' | tr -d \:)
do echo 2 > /proc/irq/$n/smp_affinity
for n in $(cat /proc/interrupts | awk '/eth2/ { print $1}' | tr -d \:)
do echo 4 > /proc/irq/$n/smp_affinity
for n in $(cat /proc/interrupts | awk '/eth3/ { print $1}' | tr -d \:)
do echo 8 > /proc/irq/$n/smp_affinity
for f in /sys/class/net/*/queues/*/byte_queue_limits/
do echo 6056 > $f/limit_max
for c in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do echo performance > $c
done
done
done
done
done
exit 0
ALl Offloads are disabled, no wifi built in. I'm using cake/piece of cake. There's got to be more tweaking to bring the cpu usage down. I'm only at 50/10 vdsl2.
Something is very wrong if your j1900 can't shape that with ~10% of one core
It does shape BUT cpu3 spikes to ~90% even with my tweaks. I'll post some default and after screenshots tomorrow. Normally everything is pinned to cpu0 and 3 cores are maxing out. After my tweaks, only 1 core is red in htop loads are spread out albeit not as evenly as I hoped for.
Current settings with 20-smp left enabled.
for n in $(cat /proc/interrupts | awk '/eth1/ { print $1}' | tr -d \:); do echo 2 > /proc/irq/$n/smp_affinity;done
for n in $(cat /proc/interrupts | awk '/eth2/ { print $1}' | tr -d \:);do echo 4 > /proc/irq/$n/smp_affinity;done
for n in $(cat /proc/interrupts | awk '/eth3/ { print $1}' | tr -d \:);do echo 8 > /proc/irq/$n/smp_affinity;done
for f in /sys/class/net/*/queues/*/byte_queue_limits/;do echo 6056 > $f/limit_max;done
for c in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;do echo performance > $c;done
for i in $(ls /sys/class/net | awk '/eth/');do ethtool -K $i tso off gso off gro off;done
find /sys/devices -name xps_cpus | awk '/eth1/' | while read q; do echo 2 > $q; done
find /sys/devices -name xps_cpus | awk '/eth2/' | while read q; do echo 4 > $q; done
find /sys/devices -name xps_cpus | awk '/eth3/' | while read q; do echo 8 > $q; done
find /sys/devices -name rps_cpus | awk '/eth1/' | while read q; do echo 2 > $q; done
find /sys/devices -name rps_cpus | awk '/eth2/' | while read q; do echo 4 > $q; done
find /sys/devices -name rps_cpus | awk '/eth3/' | while read q; do echo 8 > $q; done
Update:
I had tried renaming /etc/hotplug.d/net/20-smp-tune script to something else and that failed.
Completely removing (deleting) it and then tuning has drastically reduced cpu usage below 20% on all cores.
The RPS/XPS commands could be shortened a little.
No functional sense here just for better readability
find /sys/devices -name "[x|r]ps_cpus" | awk '/eth1/' | while read q; do echo 2 > $q; done
find /sys/devices -name "[x|r]ps_cpus" | awk '/eth2/' | while read q; do echo 4 > $q; done
find /sys/devices -name "[x|r]ps_cpus" | awk '/eth3/' | while read q; do echo 8 > $q; done