Rpi4 < $(community_build)

fyi, for any (future) users with the seeed carrier... a kmod(.ko) is currently available here (21.02.1 only):

https://github.com/sergey-brutsky/openwrt-seeed-carrier-board

while it might be a little bit of a pain to install on this build for beginners worth mentioning... ( thanks sergey! )... when an official ipk becomes available for usb-net-lan78xx i'll add it to the build (even tho' it's unlikely there will be more than 1-2 seeed users afaik at most)

Go to SYSTEM / STARTUP / LOCAL STARTUP

and add those two commands to the end of the file, save, reboot.

On the snapshot that I installed, by default, core 0 is taking the load for just about everything. I guess this just isn't an issue for users with a connection of 0.5GB or less as a single core can cope. But on faster connections CPU affinity is critical.

I experimented a little with IRQ balance and packet steering, they work but not consistently, there seems to be a lot of volatility in how they distribute the load.

Manually assigning both eth0 interrupts to core 1 seems to be enough most of the time. But core zero is still hitting 100% when the data collection services are running.

My lazy fix is to overclock the Pi4 to give it just a little more headroom under load, but the better solution would be to manually distribute heavy services to specific cores. And given that the RPi4 is a known quantity, and the vast majority of users will be using a USB3 dongle, it seems reasonable to setup a default profile that just assigns all of this stuff at boot.

So my next question: is it possible to start services on specific cores?

PS I still can't find a way to overclock the Pi4 using OpenWRT - I'm a newb! ¯_(ツ)_/¯

1 Like

as there is no one size fits all solution, examples are provided in the build as well as hooks for using your own script as discussed above
( tried it several times already, what works for one does not necissarily work for others, and guess who ends up holding the pieces? )

you will find examples in the sample script already implemented
(nope: that was removed too search this thread for TASKSET or taskset)


more than happy to include anyone's suggested script within the build

this is currently the most practical way forward until broader user input or knowledgeable (succinct) test data is provided

That's fair, your solution of using user submitted scripts as a sort of 'selectable profile' is a very pragmatic approach that anyone can learn to adjust.

I'm working my way through your script now, I appreciate that you commented it but it's pretty hard going for a newb like me but I I'm getting there. I saw there's a bunch of commented out code in perftweaks that would implement CPU affinity and now I'm wondering why you disabled it?

Is there an overview guide of how your changes hang together? E.g where your scripts start in the boot process and how they are strung together. If that's a big ask please ignore.

The performance of your build is huge compared to the default snapshot builds I started with, I'm trying to understand what changes you've made purely from a performance standpoint. Is that all in perftweaks?

1 Like

thanks for the questions...

pretty much as above... spend around a month of my life on this... and in the end there;

  • was not enough skilled testers to provide the level of feedback / testing / future improvements
  • the support burden was massive, lots of questions about why ABC is not doing XYZ ( with little background information given )
  • as above, performance and performance optimization testing is a fickle thing... give 10 cooks a lasagne recipe and you will get 10 different things...

so I got a bit burnt out with it and said to myself: 'turn it all off except some reliable general tweaks, and allow the users to make their own changes'

its not absolute, and I do hope to one day to return to my better scripts, but for the medium term separate submitted user scripts is the most managable way to offer best performance for everyone without 10 fold factor in questions and problems compared to solutions and feedback offered...


it is all very basic... and i'll even fixup anything fancy if you want to submit/suggest something without the fancy variables and hooks... all tweaks boil down to around 10 simple commands...

almost everything in my build gets called out of rc.local > rc.custom > elsewhere

you can find almost everything with fgrep

fgrep -r rpi-perftweaks /etc/custom

there is one KEY element... this script is called in the background after a sleep of around 200 seconds... this is to allow for all services to startup and settle, otherwise RENICE and TASKSET may have no process to work on...

and because you went through the hassle of providing some quality observations... I dug up an old reference sample of the user configurable tweaks script for you if interested note: this does not discover eth0 irq numbers like the current one so wont work likely without fixes... useful for the servicecpuadjust() (taskset) you were asking about...


i don't think its really that major (but admittedly I have not really compared for a looong time)... has been discussed before in this thread... will link it here when I have some time...

essentially;

  • some teency config.txt changes or whatever
  • some sysctl's (but I don't think they really do much or at least I don't have a great understanding of what they really do... just seemed sensible / worth chucking in)
  • 2-5 days looking at the governor and limiting the threshold at which it throttles back cpu power... it's very agressive on these devices... minfrequ is set to maybe half of the frequency range as part of this
  • perftweaks (cpu affinity, process affinity, process renice, packet steering) all on or off or with minor variances depending on how far you go back / which build revision you look at
1 Like

Thanks Wulfy, that's an excellent breakdown and allows me to dig in further. I've already started editing a forked copy of your perftweaks script as suggested and dumbed down your comments a little so that I can understand them :slight_smile:

I fully appreciate how this would burn you out, 1800 comments in this thread alone!

And don't play down your hard work, the latest official build ran like cr@p for me and there's practically no optimisation guidance to be found.

I was about to give up and build an x86 OPNsense box when I stumbled across your work - it took me 10 minutes to get it running with GB throughput, I wasted 1/2 day trying and failing on the latest official build!

1 Like

Lazy guide to achieving 1GB throughput on this amazing community build.

This is basically a note to my future self when I forget everything :slight_smile:

  • Use a Realtek USB3 adaptor for WAN (way lower CPU usage than other chipsets I tested)
  • Disable packet steering
  • Uninstall nlbwmon (it eats CPU time)
  • Manually set affinity for eth0 to it's own CPU core (or maybe try enabling IRQsteering - YMMV)
  • A 2GB Rpi4 is more than enough, extra memory won't improve anything.
  • QoS was unnecessary for me, I'm getting A+ on all of the buffer bloat tests.
  • Overclocking is entirely unnecessary with nlbwmon disabled.
  • I've enabled DoH, Adblock (with XXL lists) and Wireguard server - no issues

That's literally all I needed to do in order to get line speed with around 40% utilisation across all 4 cores! Impressive work @anon50098793 - you made this easy.

2 Likes

interesting findings... great advice over ~650Mb/s to just trash nlbwmon &&|| luci statistics the bursting messes up cpu utilisation at those levels as you found...

great you don't need SQM... and the packet steering is sort of tied to that AFAIK... so users of SQM(over around 550Mb/s) would be advised to definitely use packet steering

    TASKSET="$(command -v taskset-aarch64)"
	for thispid in $(pidof nlbwmon); do
		$TASKSET -apc 3 $thispid 2>&1 >/dev/null
	done
	for thispid in $(pidof collectd); do
		$TASKSET -apc 3 $thispid 2>&1 >/dev/null
	done
	for thispid in $(pidof uhttpd); do
		$TASKSET -apc 2 $thispid 2>&1 >/dev/null
	done

findRUPT() {
	fgrep ${1} /proc/interrupts  | sed 's|^ ||g' | cut -d':' -f1 | \
		tr -s '\n' ' '
}

eth0INTs="$(findRUPT eth0)"
tRU=
if [ ! -z "$eth0INTs" ]; then
	for tRU in $eth0INTs; do
		coreSET=${coreSET:-1}
		echo -n ${coreSET} > /proc/irq/$tRU/smp_affinity
		coreSET=$((coreSET + 1))
	done
fi

#would be good if you can test all 'c' and all 'f' and all '0' here also without SQM (can test with also but mostly interested without)
echo -n 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo -n 2 > /sys/class/net/eth0/queues/tx-1/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus
echo -n 4 > /sys/class/net/eth0/queues/tx-3/xps_cpus
echo -n 2 > /sys/class/net/eth0/queues/tx-4/xps_cpus
echo -n 7 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo -n 7 > /sys/class/net/eth1/queues/rx-0/rps_cpus

echo -n "1100000" > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo -n 21 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold && sleep 2
echo -n 5 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

I tested your script and it worked as expected regards setting affinity, much cleaner than my hardcoding eth0 to core 1.

I also tried setting the queues with c / f / 0s - all achieved 1Gbps except all 0s - which dropped throughput down to ~600Mbps.

I re-installed and enabled nlbwmon and ran all of the tests again with the same results (there's a margin of error of around 3% which I'm attributing the general upstream test servers).

So fixing nlbwmon to core 3 was all that was required to re-enable it with no loss of performance at 1Gbps speeds! That's a nice result.

2 Likes

thankyou very much for your tests...

in honour of your efforts I will re-introduce/introduce a '~1Gbs(tba)' parameter that can be easily set in future builds

AND put the perftweaks script on github should anyone wish to make PR

PERFTWEAKS_Gbs=1
2 Likes

Glad to make a tiny contribution.

Looking forward to testing out the next build!

Perftweaks="Inhonourofsubzero" has a nice ring to it :joy:

1 Like

@swanson r18436 ( master @ 'stable' || 'current' ) now contains a fix for the onboard wifi

for this reason I pushed it down to stable, if anyone else does not need the onboard wifi then no real reason to update (from r18370)...

suppose I may as well generate a newer 'release'/21.02.1 build... if anyone is on 21.02.1_1.0.10-x same thing... if you don't need the onboard wifi then no real reason to update also...

1 Like

I can't access LuCI then login ssh got this log

WedDec2916:53:442021 mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
WedDec2916:53:442021 mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
WedDec2916:53:442021 mmc0: Got data interrupt 0x00000002 even though no data operation was in progress

Then after reboot cannot access ssh & LuCI too
With version rpi-4_21.02.1_1.0.10-3_r16325_extra_release_update

1 Like

bugger... i'd suggest possibly using a new/different sdcard with a factory image (if you have a backup from the updatecheck bar you could restore that ... or from a linux pc you may be able to copy all the files from /etc/config/ from one sdcard to the other)

while there is a thing or two that comes to mind recently that may be related... based on this until someone else reports the same messages... we are best to assume it's an isolated disk issue

avoiding for the future could be stuff like;

  • better sdcard
  • cooler case
  • avoiding power loss
  • double checking nothing heavy is writing to mmc
  • maybe using a powered usb-ssd-dock for the OS instead of mmc

or could just be one of those random things there is no control over...

1 Like

Thanks you for the advices. Sadly I don't run linux OS in any devices.

This old sdcard from two years ago i used to flash sdcard frequently before use your build

1 Like

any usbstick can also be used if you have a free port and no spare mmc card

1 Like

Woahh that's good. Is it just plugged into usb 3?
Found sdcard 64GB in bag lol

1 Like

yup, either should work

(the only catch is depending on the boot order it will attempt the mmc(first) by default... so long as there is nothing in there or the mmc has no OS it should then try USB i think... also adds a good 10 seconds before it boots too)

for microsd's (if anyone is getting replacements) i'd recommend either of these examples (price is AUD so divide by 0.6 or something) and these sizes are around the current sweet spot... for sdcards be vary wary of ebay and stuff... worth paying double even from a good store if there is no other option...

Samsung 64GB evo plus ~ approx 7$US?
Samsung 32GB PRO Endurance ~ approx 12$US?

very cheap now! think i paid at least double that a year ago...

(I hear pretty good things about the sandisk equivalents extreme class10? but too many counterfeit sandisks going around for me personally to trust getting one online - sorry sandisk)

1 Like

just flashed and can't loaded network section with this errors.

error:
firewall.getZoneColorStyle is not a function

is it issues with argon theme? I tried other theme are normal

1 Like