Packet Steering question

I think that's more suited to a server environment vs a router in this case. I've test 1 vs 2 with the default hash of equal 1 or weight 0 1, and the default of 2 pairs with equal 2(spreading the hash more even without a hash key) is more profitable. 1 pair seems cause more cpu usage on only 2 cores vs across them. I guess the jury is still out on that. My tests have been from nperf.com and waveform.

new driver error from that link :frowning_face:
Cannot get RX network flow hashing options: Not supported
This means it's not spreading across the cores effectively.

There is a bug tracker on the driver repository.
It would be useful to try to place a ticket regarding the issue with UDP RX flow hash function.

sqm script simplest_tbf.qos gives great performance and the lowest cpu usage +/-15% per core for 200/20 docsis.

Cool, thanks for your contributions in this thread.

Can you please confirm if you are using the stock OpenWRT igb driver or the csrutil ones?

The stock one in 22.03.3 seems to support RSS already, so I'm leery of using another driver.

One more question: are you using igb RSS 2,2 in /etc/modules.d/35-igb or still using your script that use ethtool to set the NICs?

I'm currently using stock v22.03.3 combined squash as ANY EFI build has weird errors ranging from stalled reboots to loading issues.

RSS=x,x,x,x has no effect on stock. still using my ethtool script.

RSS is enabled but only select distrubtions such as Mellanox or Nvidia are actually taking advantage of the hash that assigns the cores to cpus, thus RSS should be assigned 1 queue per cpu and utilize RPS and RFS.

By default linux supposedly uses the sysct /proc/sys/net/core/netdev_rss_key but this is not the case as the drivers don't take advantage of this either, so RSS is kinda useless if you can't guide the system on how to direct the queues.

1 Like

UPDATE

Recompiled with these options, yet not all in my configs are listed. Only the main options to use for reference. This might help someone with a similiar platform to get the most out of there unit beyond the "safe" defaults. My unit is running very stable.
This version runs ultra fast compared to the later versions. I'm guessing more fluff and kernel bloat?

NAME="OpenWrt"
VERSION="19.07.10"
ID="openwrt"
ID_LIKE="lede openwrt"
PRETTY_NAME="OpenWrt 19.07.10"
VERSION_ID="19.07.10"
HOME_URL="https://openwrt.org/"
BUG_URL="https://bugs.openwrt.org/"
SUPPORT_URL="https://forum.openwrt.org/"
BUILD_ID="r11427-9ce6aa9d8d"
OPENWRT_BOARD="x86/64"
OPENWRT_ARCH="x86_64"
OPENWRT_TAINTS=""
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"
OPENWRT_DEVICE_PRODUCT="Generic"
OPENWRT_DEVICE_REVISION="v0"
OPENWRT_RELEASE="OpenWrt 19.07.10 r11427-9ce6aa9d8d"

Menu Config

CONFIG_TARGET_OPTIMIZATION="-O2 -pipe -march=silvermont"
CONFIG_TARGET_OPTIONS=y

Kernel version 
4.14.275

Kernel Config options

CONFIG_NO_HZ_IDLE=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_MATOM=y #This is actually the closest architecture to my J1900 with general instructions.

Simple script placed in /etc/hotplug.d/net/20-tuning. The sleep parameter is needed because of a race condition when booting. I haven't found a better solution to implement earlier but not before devices are loaded.

#!/bin/sh

# this will explain the layout of the processors for the j1900 quantum & similar units
# cpu(eth0) is know to the system as eth1 hence (1)
# cpu0= (1) | cpu1= (2) | cpu2= (4) | cpu3= (8)

# each line will pin the system irq's
awk '/eth0/ { gsub(/:/,""); print $1}' /proc/interrupts | while read i; do echo "e" > /proc/irq/$i/smp_affinity;done
awk '/eth1/ { gsub(/:/,""); print $1}' /proc/interrupts | while read i; do echo "e" > /proc/irq/$i/smp_affinity;done
awk '/eth2/ { gsub(/:/,""); print $1}' /proc/interrupts | while read i; do echo "e" > /proc/irq/$i/smp_affinity;done
awk '/eth3/ { gsub(/:/,""); print $1}' /proc/interrupts | while read i; do echo "e" > /proc/irq/$i/smp_affinity;done

awk '/i915/ { gsub(/:/,""); print $1}' /proc/interrupts | while read i; do echo "e" > /proc/irq/$i/smp_affinity;done
awk '/ahci/ { gsub(/:/,""); print $1}' /proc/interrupts | while read i; do echo "e" > /proc/irq/$i/smp_affinity;done

# (the default for openwrt v19 is powersaving, which works just fine). change the scaling governor to performance instead of powersaving
#find /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor -exec sh -c 'echo performance > {}' \;

sleep 15

# configure ethtool RSS to spread among the cores
for NICS in eth0 eth1 eth2 eth3; do
        ethtool -X $NICS equal 2
done

# the default cpu is 0, which we want to exclude so as to utilize the other processors. this sets each receive and transmit packet steering element rps_cpus to  (2+4+8), or e (the hexadecimal value for 14).
find /sys/class/net/*/queues/rx-[01]/rps_cpus -exec sh -c '[ -w {} ] && echo "e" > {} 2>/dev/null' \;

1 Like

Are you still running with these options set? Do you have any benchmarks to show before & after performance with these set vs defaults?

Also, where did you set these values and how did you confirm they were built against your firmware image? I am struggling with testing this myself and hope you can help me with some answers :slight_smile:

I have added the following to my .config file:

user@97247c341ad8:~/openwrt$ cat .config | grep CONFIG_HZ
CONFIG_HZ=1000
CONFIG_HZ_1000=y
CONFIG_HZ_PERIODIC=y

But in testing against the resulting image, I still see HZ = 100:

root@OpenWrt:~# awk '{print$22/'$(tail -n 1 /proc/uptime|cut -d. -f1)"}" /proc/self/stat
100.107
1 Like

The options I specified are used in make kernel_menuconfig, which are chosen in the gui prompt. As I understand it, there are several places that the kernel config pulls from and I honestly don’t remember which directories they are located in to manually edit those files. The .config is mainly for the system config but not necessarily the Linux kernel itself.