Yeaay!!! Thank you @fantom-x to steering me into the solution!

I've tried saving your script into hotplug.d/iface but it did not run. I let it sit for a while but cat /sys/class/net/*/queues/[rt]x-0/[rx]ps_cpus was still coming back with 2s and 0s. I modified to this:

#!/bin/sh

for FILE in /sys/class/net/*/queues/[rt]x-0/[rx]ps_cpus; do [ -w "$FILE" ] && echo 3 > "$FILE" 2>/dev/null; done

and after reboot all queues are set to 3 without me logging in into luci.

Thank you all for help

So where did you end up setting it, hotplug.d or rc.local?

Following the advise from @fantom-x, I started looking at hotplug.d, where I stumbled upon net/20-smp-tune, which appears to be incorrectly setting things to 2. Not had much of a chance to look further yet, but the script appears to be designed to steer the queue processing to the CPU that isn't dealing with interrupts...

1 Like

hotplug.d/iface

The following config in /etc/config/network disables 20-smp-tune.

config globals 'globals'
        option default_ps '0'
1 Like

Thanks - I found the same setting after trying to figure out what the 20-smp-tune script was doing, guess I should have checked back here first :smile:

Now I've looked in to things more, I'm not convinced using a setting of three for the queues is correct, although it's certainly a lot better than disabled on a wired connection, which from my understanding, means the queue is processed on the same CPU as handles the interrupts

As far as I can tell, in a 4.19 kernel, we can either pin the network card interrupts to a specific cpu, or let irqbalance balance it. If doing the former, it would make sense to manage the queues on on the other CPU, which I think is what the original script was trying to do anyway. As wireless interrupts are stuck on CPU0 for some reason, we would always want to use CPU1 for queue, rather than trying to balance them across both CPUs, which is what a setting of three does, although I'm not convinced it would really make that much of a difference on a wireless connection anyway...

I've quote possibly not had enough coffee this morning, but it seems to me that unless we can balance network interrupts across CPU cores, there's no point balancing queue managing across cores...

Try setting them all to two and leaving the interrupts the way they are.

my attempt to fix 20-smp-tune: https://gist.github.com/facboy/050b9290c08bcfb394d8a00937d782a1

seems to work on my R7800...well, i think it now does what it intended to do.

I just tried it and it seems to be putting all receive queues on CPU1 and then load balancing all transmit queues across both CPU's. Would not that put more work on CPU1 in the end?

based on your comment it looks like it's not setting the receive queues properly at all on ipq40xx.

i've really only fixed the script to do what was originally intended. as far as i know it's trying to put the receive queues on the CPU that is not the one with the interrupt assigned. tbh i didn't check what it would on a 4-core as the R7800 only has the 2 cores.

1 Like

My question above was about ipq806x. I have recently been trying to understand how these queues are expected to work to use them more intelligently across different router models. R7800 is my main one.
Did you see this PR ?

no i didn't see that.

i don't really have an opinion on it, but i did notice that the current script is broken and just puts everything on 0 if you have it enabled.

afaik with my changes (on my R7800), most things did end up on CPU1 (or 0) because their IRQs were allocated on the other CPU. eth0 was the outlier, and was on a different CPU to all the others.

fwiw back when i messed around with this previously having everything set to 3 seemed to give pretty good results.

I came to a similar conclusion by experimenting with high loads and watching the throughput and CPU utilization. Now I just put all queues in CPU1 unconditionally, which gives me the best results.

Mhh, I tuned eth0 and eth1 to be on different cpu's.
This is my /proc/interrupts, fairly even, too bad we can't change wifi (ath10k_pci) to use both cpu's on kernel 4.19, they are both bound to cpu 0.

 16:   30321304   20699720     GIC-0  18 Edge      gp_timer
 18:    1415765          0     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Level     ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 30:     199364       5190     GIC-0 202 Level     adm_dma
 31:       1253  137167564     GIC-0 255 Level     eth0
 32:  115312687          0     GIC-0 258 Level     eth1
 33:          0          0     GIC-0 130 Level     bam_dma
 34:          0          0     GIC-0 128 Level     bam_dma
 36:          0          0   PCI-MSI   0 Edge      aerdrv
 38:          0          0   PCI-MSI 134217728 Edge      aerdrv
 39:         10          0     GIC-0 184 Level     msm_serial0
 40:          2          0   msmgpio   6 Edge      keys
 41:          2          0   msmgpio  54 Edge      keys
 42:          2          0   msmgpio  65 Edge      keys
 43:          0          0     GIC-0 142 Level     xhci-hcd:usb1
 44:          0          0     GIC-0 237 Level     xhci-hcd:usb3
 45:   58163413          0   PCI-MSI 524288 Edge      ath10k_pci
 46:   51285575          0   PCI-MSI 134742016 Edge      ath10k_pci
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:    2495949   12022363  Rescheduling interrupts
IPI3:   51927141   11435710  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:   25804333   22006919  IRQ work interrupts
IPI6:          0          0  completion interrupts

However I noticed uptime is 8 days... should be way more so either it crashed or someone accidentally unplugged it but I doubt it..

@shelterx would it make more sense to put both eth0 and eth1 on CPU1? I think tx/rx queues are already on cpu1, the theory being that you want to to keep as much processing on the same CPU to take advantage of the L2 cache, though not sure how much it really matters in practice, processing interrupts and moving packets around seems to be complex.

l2 cache is shared across the 2 cpu so... it's really just how much you load the single cpu...
since cpu0 should handle all the load + the wifi... the best would be to load cpu1 with the ethernet traffic

Interesting that the L2 cache is shared. Is that true for ARM in general or this CPU in particular?

Think this cpu (all krait cpu)

@Ansuel I saw your recent commits. Does kernel 5.4 bring anything important to our devices or the work is needed because that's where OpenWRT is headed?

5.4 should have improvement specific for ARM arch on how the memory is handled...

2 Likes

Can anyone here give a few hints how to switch over to kernel 5.4?
is it just changing the KERNEL_PATCHVER parameter in target/linux/ipq806x/Makefile?

I'm assuming there is a better way to do it, so some tips are welcome :slight_smile: