Rpi4 < $(community_build)

anon50098793 · July 26, 2021, 12:21am

no problem at all... thankyou for reporting back and the detailed report... it's very useful to me to get detailed bug/problem reports and I learn alot from them...

in this case you've stumbled/wrestled with two of the 'build specific' 'must-haves'... they can be a little bit of a pain at first especially for advanced users...

fwiw for rc.local type stuff... you can... do something like...;

cp /etc/custom/dca632563177.sh /etc/custom/YOURMAC.sh
echo '/etc/custom/YOURMAC.sh' >> /etc/sysupgrade.conf

this will give you your own per device script that gets stored in upgrades...

the next time you upgrade... your docker packages and kmods should re-install automatically for you... if they do not... feel free to post here and we can take a look...

cheers...

Mint · July 26, 2021, 3:37am

In terms of efficiency I've found that if you're not pushing the bounds with speed that it's been more beneficial to leave everything pinned to the first CPU. My own speed is 80/25 Mbit/s so I've also disabled all offloading for all devices/interfaces with an iface hotplug script. I was having a lot of problems with strange fluctuations where ping would skyrocket and throughput would drop dramatically relatively frequently, some odd consequence of having SQM and a VPN running in tandem; as with one or the other it wouldn't crop up.

Also, as an aside wulfy, which files would I plumb from the install image to get a handle on your custom general tweaks (e.g: processes pinned to CPU, etc.)?

anon50098793 · July 26, 2021, 4:08am

the current stripped down tweaks are in the same spot... although as you mention are stripped down to some bare minimals...

/bin/rpi-perftweaks.sh

it used to print out everthing it does properly but when I had to hack it back to something somple I've hadnt time to do yet and as it's a 'DUMMY' version haven't been bothered

one eth interrupt was moved for the last two builds due to @sammutd88 s suggestion but its been moved back for the next as he did not provide the proof of benefit from this change...

the only change since the real scripts were pulled is to lookup interript numbers dynamically...

as CM4 / enabling dwc2 etc. etc. alters numbering...

one thing i've noted recently that should possibly in an ideal world be addressed ( via some sysctls ) is dd'ing 1G zeros hangs luci... but this is mostly only beneficial for people who use alot of samba or something that heavily writes to disks and does not have a huge impact on routing as a whole...

speaking of this... some 'perftweaks' are also derived from '/etc/custom/firstboot/01-sysctl' also...

Mint · July 26, 2021, 4:38am

Thanks. Alright, it's good to know that I already had a grasp of things. Just to clarify, the /boot/config.txt addition for DWC is for CM4 only? Are there any dtb additions suggested for the classic RPi4 config?

My previous anecdote, I'll add a bit more clarity as I realise that I worded it somewhat poorly. As an aside, though, I'll note that my experiences are from images built by myself, fairly light on packages as I prefer things to be as lean as possible and I prefer to go as long as possible without opening the terminal or LuCi. As mentioned, with SQM and VPN in tandem I was seeing bufferbloat spikes and bizarre throughput drops. This held true across a number of VPN servers and even when dropping SQM limitations far below actual line speed, despite CPU having a good deal of overhead. The main factors that I found to resolve this were to leave all interrupts as default on CPU0 and to disable offloading everywhere. There were a few other minor tweaks but I don't recall in my testing that they provided much benefit to resolving this particular problem. The only other benefit of note that comes to mind is banIP, I didn't realise that port-scanning and random pings were so prevalent and can only presume that the little bit of time to process these instead of immediately dropping them caused the issues I saw. My main 'real-world' performance metric was to gauge consistency in Qbittorrent seeding (upload) performance which seems heavily dependent on latency and whatever else due to it being wholly TCP. The before and after of these changes is remarkable.

anon50098793 · July 26, 2021, 4:41am

I had a failed attempt at adding it for the rpi4... and for the CM4... right now based on CM4 users input... i'm merely suggesting it...

however... for the last three builds i've injected pci-bus perf onto the command line... ( there is a similar config.txt parameter but i'm yet to determine if its reqired or one overrides the other etc. )

dtoverlay=disable-wifi

no real performance related dtbo mods are applied... from the top of my head... although for most users they would benefit from dtoverlay=disable-wifi as a means of lowering temp and interrupts...

yes... i also agree/have witnessed poorer latency when shifting interrupts from cpu0

for banip(or any other application related perf hits) i'll have a closer look at in my travels... it's possible my environment doesn't trigger the right levels to adequately reveal alot of things...

[quote="Mint, post:1110, topic:69998"] prefer things to be as lean as possible [/quote]

for this philosophy although the benefits may be negligable... I provide;

RMMOD=""

in /root/wrt.ini to trash several unused loaded kmods during runtime... or /etc/packagesremove.txt could be used to the same effect...

Mint · July 26, 2021, 4:51am

Nice. I'm unfamiliar with the concept of shifting pci-bus to the command line, could you please expand upon that? Yeah, WiFi seems a real problem. I use it as a minor sort of thing novelly due to the RPi's positioning, but performance under load is atrocious. It might be a product of RC3 support (or lack thereof), but taxing throughput via WiFi on a Macbook caused all kinds of weird errors and CPU load to skyrocket to a degree that seems outside of norms.

anon50098793 · July 26, 2021, 4:56am

i'd elaborate if I really knew what I was talking about...

but as I don't... these are the options for looking up

for cmdline.txt

 pci=pcie_bus_perf

for config.txt

dtparam=axiperf=on

only stumbled upon these after seeing the CM4 dtb is slightly different in that it beefs up pci-e bandwidth or something like that...

yeah... i've also seen OS-level performance GAINS and DROPS based on whats going on with wifi on this board... ( i.e. actually enabling it even if not using it seemed to increase overall OS function rather than leaving it in a disabled(uci) state )

Mint · July 26, 2021, 5:06am

Interesting, I'll certainly see if integrating one or both of those pci configs provides noticeable benefit when I've some spare time. That's damned peculiar with the WiFi, but such is the way of much of this as I've found through trial and error. How does it fare with respect to br-lan vs. not (eth0 instead)? Perhaps the software bridge provides some measure of benefit? Just spitballing as I've never disabled wlan, hence I've always had the br-lan interface.

anon50098793 · July 26, 2021, 5:08am

funny you mention that... until yesterday I had no idea this could lead to possible perf gains...

but added it to my 'to-do' (move wan to br-wan) list after someone on another thread(board) mentioned this exact phenomenon... ( in their case... the software bridges performed better i believe )

Mint · July 26, 2021, 5:20am

Interesting. I've never thought about the prospect of adding WAN to a bridged interface, definitely would be keen to see your results as I've no idea on how to properly go about doing so myself as yet. Also, on the note of banIP, I've integrated the following default lists: firehol1, firehol2, firehol3, iblockads, iblockspy, threat, whitelist, yoyo for both WAN and Wireguard Client interface. At two weeks uptime I'd say I'm at least 100,000 hits from the IPSet report, just for a frame of reference.

anon50098793 · July 26, 2021, 5:40am

my environment/load is probably not enough... but if there is any perf hit from banip...

it's typically as a result of the report feature (which I think you already mentioned need time to re-read and let it sink in)

so definately test with that off for a while... i'm not sure what/if any sysctls might assist if this is the case... maybe file:max_nr(probably not) or min_rsv(mem) aka settings for process memory allocation and pruning (otoh) might do something...?

generally tho... i'm running similar lists but less overall load on the router and I don't see noticable issues ( but I haven't really looked properly )...

(udp/conntrack sysctls may also be of gain for torrent style tests and uses)

there is some gremlin thats been bugging me though...

in that periodically... web pages may fail to load/router function hangs for a few seconds... it's infrequent but something i'm tracking (could be related to me giving masq more memory or cache-ttl)
have not really noticed this recently (last 5 or so builds) but after a reboot or new flash... responsiveness seems to magically speed up... so over time... something hits performance... likely candidate as above is the masq-cache settings or independant of masq... the ipset stack itself...

i'm not exactly 'taking-it-easy' with ipsets on this thing

Summary

for iSET in $(ipset -L -n); do echo "$iSET $(ipset -L $iSET | wc -l)"; done
BE4 8
BE6 8
Bulk4 8
Bulk6 8
Vid4 8
Vid6 8
Voice4 8
Voice6 8
Zoom4 8
Zoom6 8
bulk 95
bulk6 76
chunky4hit 8
chunky6hit 8
chunkyDOWN4_stk 8
chunkyDOWN6_stk 8
chunkyUP4_stk 8
chunkyUP6_stk 8
cloudflare-v4 22
cloudflare-v6 15
dshield_4 71
gamecache4 60
gamecache6 31
largehttp 8
largehttp2 8
latsens 9
latsens6 8
limiter4_conn 8
limiter6_conn 8
mwan3_connected_v4 11
mwan3_connected_v6 11
mwan3_custom_v4 8
mwan3_custom_v6 8
mwan3_dynamic_v4 8
mwan3_dynamic_v6 8
mwan3_sticky_v4_https 8
mwan3_sticky_v6_https 8
steam4 61
streaming 1136
streaming6 1670
usrcdn 508
usrcdn6 394
whitelist_4 10
whitelist_6 8
gamingdevice4 8
gamingdevice6 8
bogon_4 1386
bogon_6 123977
drop_6 46
drop_4 1038
threat_4 1319
darklist_4 9157
debl_6 83
yoyo_4 10188
debl_4 25667
firehol3_4 19186
voip_4 14997
cloudflare 10
latsensitive 10
mwan3_connected 14
mwan3_sticky_https 10
gamingdevice 10
### roughly 215302 entries

### dca632 /usbstick 48°# iptables-save -c | grep match-set | grep -v '0:0'
[6134:451012] -A QOS_MARK_F_eth1 -m set --match-set usrcdn dst -m comment --comment CDN4dstB-AF21 -j DSCP --set-dscp 0x12
[44:3200] -A QOS_MARK_F_eth1 -m set --match-set cloudflare-v4 dst -m comment --comment CLOUDFL4dst-CS3 -j DSCP --set-dscp 0x18
[104:6572] -A QOS_MARK_F_eth1 -m set --match-set gamecache4 dst -m comment --comment GAMECACHE4dst-CS6 -j DSCP --set-dscp 0x30
[1186:113640] -A QOS_MARK_F_eth1 -m set --match-set streaming dst -m comment --comment STREAMING4-CS3 -j DSCP --set-dscp 0x18
[773:68508] -A QOS_MARK_F_eth1 -m set --match-set latsens src -m comment --comment LATSENS4-SRC-CS4 -j DSCP --set-dscp 0x20
[237:14146] -A dscptagstatic -p tcp -m set --match-set bulk dst -m comment --comment STATIX-BKdsttcp4-CS1 -j DSCP --set-dscp 0x08
[116:22554] -A banIP -i eth1 -m set --match-set whitelist_4 src -j RETURN
[170:13792] -A banIP -o eth1 -m set --match-set bogon_4 dst -j REJECT --reject-with icmp-port-unreachable
[34:2040] -A banIP -o eth1 -m set --match-set darklist_4 dst -j REJECT --reject-with icmp-port-unreachable
[6:384] -A banIP -o eth1 -m set --match-set yoyo_4 dst -j REJECT --reject-with icmp-port-unreachable

Mint · July 26, 2021, 6:11am

Ah, sorry, I may have conveyed the wrong notion with how I worded it once again. I meant, instead, that banIP afforded a benefit rather than a detriment. I can only guess on the why and how with my current knowledge base, but my current leading presumption is that the explicit firewall rules (pertaining to their source) that immediately result in the drop rather than allowing for further probes is a net benefit as from my own usecase testing it has shown that the inclusion of banIP seems to mitigate the bizarre bufferbloat/throughput drop issue when applied in conjunction with the other changes I've mentioned.

And on your issue of DNSMasq issue I've never encountered such problems. However I instead use Adguard Home as my default DNS, binding to port 53 with DNSMasq on port 1053. The luci integration provides an interesting addition, https://github.com/kongfl888/luci-app-adguardhome/releases . It's also not immediately obvious, and it doesn't seem to jump out on the github, but it's definitely recommended to add something like "[/lan/arpa/]127.0.0.1:1053" to your upstream DNS. Where inside the brackets are the domains/TLDs to which the upstream server is applied, and of course the IP:Port is what your DNSMasq responds to. The general behaviour to be expected is that AGH automatically does so for PTR and local domain requests, but I found that it doesn't always follow so an explicit rule helps in this way.

anon50098793 · July 26, 2021, 6:13am

indeed... (no i'm just slow at assimilating verbal stuff)

5-7% of all web traffic (or something similar) as far as I recall... maybe that was adblock dns... but still an interesting metric...

at your level of debugging you'll probably benefit from 'mpstat' and the htoprc wrapper script in the build;

### htoprc -RC list
htop-rc-mem-sort
htoprc-cpu-nokernel
htoprc-cpu-w-kernel
htoprc-cpu-w-kernel-sortcpu
htoprc0
htoprc1
htoprc2
htoprc3
htoprc6
htoprc7

Mint · July 26, 2021, 6:32am

Honestly I was quite surprised that banIP yielded noticeable benefit as I wouldn't have expected it to have impacted performance in such a way. I can only guess that stacking so many things together on OpenWrt makes the whole configuration a touch more sensitive when it comes to the minutiae of latency especially with the mess that TCP can be. Independently these things operate sufficiently, but their amalgamation has troubled me for some time. I'd wager that there are further improvements to be made on the router side, but quite frankly the testing is so damned taxing in effort and time.

I'll create a specified test image for a spare SD-Card with the inclusion of those packages as I'm a bit of a nut in terms of package inclusion, it's either included in install or not at all.

anon50098793 · July 26, 2021, 6:37am

yes... if I were to throw out a guess...

banip rules = keeps 'net-interrupt-handler/timer' 'awake' sort of thing... (or it's more of a pi-foundation 'we are still tweaking our stuff' thing... which is highly likely... but all the new stuff goes into 5.10)

i suspect the onboard regulator has some flaws/limitations... and it's plausible that the foundations hard power management defaults aren't just to save the environment... but to limit hangs and fallout in this regard...

(the perftweaks up/down thresh were pretty important to stop the thing 'throttle' stepping so much that's for sure but I can't say as to if/how much that would effect this. when I was looking into those I added the following to my cmdline.txt but have no idea if they do anything)

workqueue.power_efficient=0 dwc_otg.lpm_enable=0 usbcore.autosuspend=-1

for this level of debugging... if you download this or the one next to it... open it in a browser... and zoom in near the network side of things... you can clearly see how much time certain function calls take within the stack... and doing the same with something on or off would likely yeild something tangible...

although they are pretty lowlevel to interpret or act on if you don't have C/low-level skills (me) lol....

sammutd88 · July 26, 2021, 7:40am

Out of my depth, but if you were having any hangs and thought it may be the unit throttling the CPU's up and down, I added

echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

to rc.local and disabled the govenor. Now it just runs at 1500mhz all day long. Temp never gets above 53 degrees in a good heatsink case.

neil1 · July 26, 2021, 11:29am

@anon50098793

is it normal?

anon50098793 · July 26, 2021, 11:32am

can you post;

cat /proc/interrupts

neil1 · July 26, 2021, 11:39am

cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  3:    7646504    8443043    8250258     670362     GICv2  30 Level     arch_timer
 11:     172894          0          0          0     GICv2  65 Level     fe00b880.mailbox
 14:          2          0          0          0     GICv2 153 Level     uart-pl011
 17:       1705          0          0          0     GICv2 114 Level     DMA IRQ
 24:          7          0          0          0     GICv2  66 Level     VCHIQ doorbell
 25:      19817          0     938869          0     GICv2 158 Level     mmc1, mmc0
 31:   16988884          0          0          0     GICv2 189 Level     eth0
 32:       1497   10540404          0          0     GICv2 190 Level     eth0
 38:          0          0          0          0     GICv2 175 Level     PCIe PME, aerdrv
 39:   30845207          0          0          0  BRCM STB PCIe MSI 524288 Edge      xhci_hcd
IPI0:    518033     734978     575403     839272       Rescheduling interrupts
IPI1:     11257    7960888    8129528      11536       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:    466242     475458     500626     321304       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts
Err:          0

anon50098793 · July 26, 2021, 11:44am

cant really say if it's normal or not... but;

your levels are x 5 compared to mine which seem to be emulating eminating from xhci (usb)

so you are probably using a usb3 disk to access alot of files and also for network maybe...