Multiqueue IFB

digging around here https://patchwork.ozlabs.org/project/netdev/patch/1436213128.4571.14.camel@edumazet-glaptop2.roam.corp.google.com/

  1. Is this possible with OpenWRT?
  2. Would it be beneficial(cake ingress)?

no it's probably not beneficial. If I understand, this is really mostly about handling 10GigE interfaces etc using multiple cores.

Cake needs to see all the packets and is single-threaded anyway.

It wouldn't hurt if you weren't doing QoS but then if you're not doing QoS you probably aren't using an IFB either...

Maybe that's something for the suggestion box lol

I'm now using a QOTOM x86_64 device j1900 Celeron Quad Core...

I've tried moving around the rx/tx queues to different core but still managing to peg at ~90% with sqm.

Yep, the j1900 gets just about to 1Gbps with shaping before it saturates the shaper. Through the magic of moore's law the RPi4 is something like twice as fast and only uses 50% of one core shaping a gig.

Wow. That's pretty nice for an arm processor. I wouldn't expect that kind of performance.

Here's a pieced script I use currently to pin irq's and some other settings.

#!/bin/ash

  for n in $(cat /proc/interrupts | awk '/eth1/ { print $1}' | tr -d \:)
    do echo 2 > /proc/irq/$n/smp_affinity
  for n in $(cat /proc/interrupts | awk '/eth2/ { print $1}' | tr -d \:)
    do echo 4 > /proc/irq/$n/smp_affinity
  for n in $(cat /proc/interrupts | awk '/eth3/ { print $1}' | tr -d \:)
    do echo 8 > /proc/irq/$n/smp_affinity
  for f in /sys/class/net/*/queues/*/byte_queue_limits/
    do echo 6056 > $f/limit_max
  for c in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    do echo performance > $c
 done
  done
    done
      done
        done
exit 0

ALl Offloads are disabled, no wifi built in. I'm using cake/piece of cake. There's got to be more tweaking to bring the cpu usage down. I'm only at 50/10 vdsl2.

Something is very wrong if your j1900 can't shape that with ~10% of one core

It does shape BUT cpu3 spikes to ~90% even with my tweaks. I'll post some default and after screenshots tomorrow. Normally everything is pinned to cpu0 and 3 cores are maxing out. After my tweaks, only 1 core is red in htop loads are spread out albeit not as evenly as I hoped for.

Current settings with 20-smp left enabled.

for n in $(cat /proc/interrupts | awk '/eth1/ { print $1}' | tr -d \:); do echo 2 > /proc/irq/$n/smp_affinity;done
  for n in $(cat /proc/interrupts | awk '/eth2/ { print $1}' | tr -d \:);do echo 4 > /proc/irq/$n/smp_affinity;done
  for n in $(cat /proc/interrupts | awk '/eth3/ { print $1}' | tr -d \:);do echo 8 > /proc/irq/$n/smp_affinity;done
  for f in /sys/class/net/*/queues/*/byte_queue_limits/;do echo 6056 > $f/limit_max;done
  for c in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor;do echo performance > $c;done
  for i in $(ls /sys/class/net | awk '/eth/');do ethtool -K $i tso off gso off gro off;done
 find /sys/devices -name xps_cpus | awk '/eth1/' | while read q; do echo 2 > $q; done
 find /sys/devices -name xps_cpus | awk '/eth2/' | while read q; do echo 4 > $q; done
 find /sys/devices -name xps_cpus | awk '/eth3/' | while read q; do echo 8 > $q; done
 find /sys/devices -name rps_cpus | awk '/eth1/' | while read q; do echo 2 > $q; done
 find /sys/devices -name rps_cpus | awk '/eth2/' | while read q; do echo 4 > $q; done
 find /sys/devices -name rps_cpus | awk '/eth3/' | while read q; do echo 8 > $q; done

1 Like

Update:

I had tried renaming /etc/hotplug.d/net/20-smp-tune script to something else and that failed.
Completely removing (deleting) it and then tuning has drastically reduced cpu usage below 20% on all cores.

2 Likes

The RPS/XPS commands could be shortened a little.
No functional sense here just for better readability :wink:

find /sys/devices -name "[x|r]ps_cpus" | awk '/eth1/' | while read q; do echo 2 > $q; done
find /sys/devices -name "[x|r]ps_cpus" | awk '/eth2/' | while read q; do echo 4 > $q; done
find /sys/devices -name "[x|r]ps_cpus" | awk '/eth3/' | while read q; do echo 8 > $q; done
1 Like