use a managed switch and bonded NICs and VLANs, you can go a long way with this unless you plan to do multiwan with 4 or 5 gigabit wan connections
Currently shows "powersave". Will give a try when I am measuring performance.
pretty sure powersave keeps the clock at it's lowest all the time. so it will be best to set it to performance for a router, unless you maybe set it to powersave every night and change it again in the morning using cron
Unless scaling is very broken it should be fine, you might run into latency issues in theory but I doubt you'll see that on x86.
ondemand scales up but I think powersave stays at low clock the whole time
Correct, ondemand should be fine in most cases.
-
rm /etc/hotplug.d/net/20-smp-tune
DONOT LEAVE THIS SCRIPT AS IT WILL NEGATE ANY CHANGES -
if your WAN is say eth0, pin the irq affinity eth1-3 to the corresponding core eth1-3. You only need to move the eth* as opposed to every rx-tx.
-
echo the smp_affinity to each rps & xps
-
echo performance to scaling governor
The workload will now be spread across mulitple cores and cpu usage should now be below ~20% as opposed to spikes of 85-90% as viewed using htop.
That's a huge performance gain claimed.
Will certainly come back to this when I want to extract more performance from this.
I recently purchased this unit Nov 19 and have applied the listed tweaks above so I can definitely back this claim. The smp hotplug script is the first thing to delete because it will override any smp settings. Maybe it helps on some systems, but I found it hurts on this unit. I'll post some before and after screenshots sometime this week as I have late night work scheduled.
Please do post specific commands. Some of the steps you posted I didn't understand.
I was planning to research and use them but will be helpful if you can provide such that I can copy-paste commands.
Post the output of cat /proc/interrupts
Try these commands. irqbalance doesn't do a good job at all and you banirqs doesn't work, you'll need the newer code anways. Try these and let me know. Edit this to your current setup if need be. The numbers 2, 4, 8 are the affinity masks.
for n in $(cat /proc/interrupts | awk '/eth1/ { print $1}' | tr -d \:); do echo 2 > /proc/irq/$n/smp_affinity;done
for n in $(cat /proc/interrupts | awk '/eth2/ { print $1}' | tr -d \:);do echo 4 > /proc/irq/$n/smp_affinity;done
for n in $(cat /proc/interrupts | awk '/eth3/ { print $1}' | tr -d \:);do echo 8 > /proc/irq/$n/smp_affinity;done
# for f in /sys/class/net/*/queues/*/byte_queue_limits/;do echo 6056 > $f/limit_max;done
for c in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;do echo performance > $c;done
for i in $(ls /sys/class/net | awk '/eth/');do ethtool -K $i tso off gso off gro off;done
find /sys/devices -name xps_cpus | awk '/eth1/' | while read q; do echo 2 > $q; done
find /sys/devices -name xps_cpus | awk '/eth2/' | while read q; do echo 4 > $q; done
find /sys/devices -name xps_cpus | awk '/eth3/' | while read q; do echo 8 > $q; done
find /sys/devices -name rps_cpus | awk '/eth1/' | while read q; do echo 2 > $q; done
find /sys/devices -name rps_cpus | awk '/eth2/' | while read q; do echo 4 > $q; done
find /sys/devices -name rps_cpus | awk '/eth3/' | while read q; do echo 8 > $q; done
Be sure to measure with htop before and after the changes.
A Zombie post lives!!
Not sure if the Newer if not Original Poster will be back, but I'm interested in seeing if better tuning is worth it for my above mentioned CI327, that I still use.
My situation is different, in that I have only 2 instead of 4 ethernet ports in the CI327, so I'm not clear on how many things need to be different for a case like mine.
Also, I see varying things looking at cat /proc/interrupts vs using htop. With the former, I see a ton of interrupts for eth0 and eth1, both on CPU0 only. With htop, during a Speedtest creating a lot of traffic, I see softirq's on the bargraphs, but fairly evenly distributed across all 4 cores. Hard IRQ vs soft IRQ? Is there a way to set up htop for clearer or more detailed info? One thing I seem to be missing to get a more quantitative number is a display like in top, where there's a total % of sirq, for instance. Hard to watch 4 changing bargraphs and guestimate totals to be able to see if changes make a difference.
Here's my cat /proc/interrupts
root@OpenWrt:~# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 26 0 0 0 IO-APIC 2-edge timer
1: 4 0 0 0 IO-APIC 1-edge i8042
4: 16 0 0 0 IO-APIC 4-edge ttyS0
8: 1 0 0 0 IO-APIC 8-fasteoi rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 5 0 0 0 IO-APIC 12-edge i8042
39: 52 0 0 0 IO-APIC 39-fasteoi mmc0
42: 52 0 0 0 IO-APIC 42-fasteoi mmc1
120: 397 0 0 0 PCI-MSI 32768-edge i915
121: 20 0 0 0 PCI-MSI 294912-edge ahci[0000:00:12.0]
122: 7463 0 0 0 PCI-MSI 344064-edge xhci_hcd
123: 791437515 0 0 0 PCI-MSI 1048576-edge eth0
124: 786251854 0 0 0 PCI-MSI 1572864-edge eth1
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 527368966 891906441 927056053 966413305 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 18407113 20374747 13625146 14481811 Rescheduling interrupts
CAL: 32621 280873617 279814627 343650721 Function call interrupts
TLB: 24525 10166 10962 10119 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 32766 21014 18308 Threshold APIC interrupts
DFR: 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 17350 54939 45607 42957 Machine check polls
HYP: 0 0 0 0 Hypervisor callback interrupts
ERR: 0
MIS: 0
PIN: 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 Posted-interrupt wakeup event
look in /etc/hotplug.d/ and rm the smp tune script && reboot.
then cat /proc/interrupts and look to see if the interrupts have been evenly distributed (mostly). If they haven't, then it's time to manually move things around e.g. eth1 > cpu1 etc.
# Spread the APIC interrupts evenly across CPU1-3
echo 2 > /proc/interrrupts/4/smp_affinity
echo 2 > /proc/interrrupts/8/smp_affinity
echo 4 > /proc/interrrupts/9/smp_affinity
echo 4 > /proc/interrrupts/12/smp_affinity
echo 8 > /proc/interrrupts/39/smp_affinity
echo 8 > /proc/interrrupts/42/smp_affinity
# Do the same with MSI interrupts
echo 4 > /proc/interrrupts/120/smp_affinity
echo 4 > /proc/interrrupts/121/smp_affinity
echo 8 > /proc/interrrupts/123/smp_affinity
# ETH1 corresponds to CPU1, which makes sense to pin it there.
echo 2 > /proc/interrrupts/124/smp_affinity
Sorry I haven't had a chance to try out some of this..
Going upthread a bit, I have checked the scaling governor, and it's on powersave. But, I do see in htop, the cpu speeds changing up and down. Hmm... do I need to change that, then? I haven't pulled out the smp script yet. Wondering if there's an issue with performance or SQM processing, if the cpu speed is bouncing from 795-2300 with varying load. Also wondering how much warmer it will be if I pin it on 2.3Ghz...
As I mentioned earlier, it looks like the load is distributed evenly across the 4 cores, doing my run a speedtest and watch with htop test, although the indicated cpu speeds are bouncing up and down. Is this indicating some kind of load balancing, even though it looks different in /proc/interrupts?
That's the Intel pstate power save. There is a setting where you can write a value to a file and force lower latency behavior
But you need to hold the file open. I wrote a shell script that does it but will have to figure out where it is.
In powersave mode, as @dlakelan pointed out, it's entering a deep sleep state and then trying to wake up to take care of the requested interrupts, causing latency. you could echo performance or see if ondemand is an option as well.
performance/ondemand are older techniques, in the intel pstate driver there's just performance and powersave (powersave is very like the older ondemand).
The thing to do is write a shell script which opens up the file /dev/cpu_dma_latency and writes the number 1000 in 32bit binary integer form, and then holds the file open between the hours of say 6am and 11pm. This should cause the pstate driver to not go into lower power movdes that take a lot of time to wake from during normal "daytime" hours.