no problem at all... thankyou for reporting back and the detailed report... it's very useful to me to get detailed bug/problem reports and I learn alot from them...
in this case you've stumbled/wrestled with two of the 'build specific' 'must-haves'... they can be a little bit of a pain at first especially for advanced users...
fwiw for rc.local type stuff... you can... do something like...;
this will give you your own per device script that gets stored in upgrades...
the next time you upgrade... your docker packages and kmods should re-install automatically for you... if they do not... feel free to post here and we can take a look...
In terms of efficiency I've found that if you're not pushing the bounds with speed that it's been more beneficial to leave everything pinned to the first CPU. My own speed is 80/25 Mbit/s so I've also disabled all offloading for all devices/interfaces with an iface hotplug script. I was having a lot of problems with strange fluctuations where ping would skyrocket and throughput would drop dramatically relatively frequently, some odd consequence of having SQM and a VPN running in tandem; as with one or the other it wouldn't crop up.
Also, as an aside wulfy, which files would I plumb from the install image to get a handle on your custom general tweaks (e.g: processes pinned to CPU, etc.)?
the current stripped down tweaks are in the same spot... although as you mention are stripped down to some bare minimals...
/bin/rpi-perftweaks.sh
it used to print out everthing it does properly but when I had to hack it back to something somple I've hadnt time to do yet and as it's a 'DUMMY' version haven't been bothered
one eth interrupt was moved for the last two builds due to @sammutd88 s suggestion but its been moved back for the next as he did not provide the proof of benefit from this change...
the only change since the real scripts were pulled is to lookup interript numbers dynamically...
as CM4 / enabling dwc2 etc. etc. alters numbering...
one thing i've noted recently that should possibly in an ideal world be addressed ( via some sysctls ) is dd'ing 1G zeros hangs luci... but this is mostly only beneficial for people who use alot of samba or something that heavily writes to disks and does not have a huge impact on routing as a whole...
speaking of this... some 'perftweaks' are also derived from '/etc/custom/firstboot/01-sysctl' also...
Thanks. Alright, it's good to know that I already had a grasp of things. Just to clarify, the /boot/config.txt addition for DWC is for CM4 only? Are there any dtb additions suggested for the classic RPi4 config?
My previous anecdote, I'll add a bit more clarity as I realise that I worded it somewhat poorly. As an aside, though, I'll note that my experiences are from images built by myself, fairly light on packages as I prefer things to be as lean as possible and I prefer to go as long as possible without opening the terminal or LuCi. As mentioned, with SQM and VPN in tandem I was seeing bufferbloat spikes and bizarre throughput drops. This held true across a number of VPN servers and even when dropping SQM limitations far below actual line speed, despite CPU having a good deal of overhead. The main factors that I found to resolve this were to leave all interrupts as default on CPU0 and to disable offloading everywhere. There were a few other minor tweaks but I don't recall in my testing that they provided much benefit to resolving this particular problem. The only other benefit of note that comes to mind is banIP, I didn't realise that port-scanning and random pings were so prevalent and can only presume that the little bit of time to process these instead of immediately dropping them caused the issues I saw. My main 'real-world' performance metric was to gauge consistency in Qbittorrent seeding (upload) performance which seems heavily dependent on latency and whatever else due to it being wholly TCP. The before and after of these changes is remarkable.
I had a failed attempt at adding it for the rpi4... and for the CM4... right now based on CM4 users input... i'm merely suggesting it...
however... for the last three builds i've injected pci-bus perf onto the command line... ( there is a similar config.txt parameter but i'm yet to determine if its reqired or one overrides the other etc. )
dtoverlay=disable-wifi
no real performance related dtbo mods are applied... from the top of my head... although for most users they would benefit from dtoverlay=disable-wifi as a means of lowering temp and interrupts...
yes... i also agree/have witnessed poorer latency when shifting interrupts from cpu0
for banip(or any other application related perf hits) i'll have a closer look at in my travels... it's possible my environment doesn't trigger the right levels to adequately reveal alot of things...
[quote="Mint, post:1110, topic:69998"]
prefer things to be as lean as possible
[/quote]
for this philosophy although the benefits may be negligable... I provide;
RMMOD=""
in /root/wrt.ini to trash several unused loaded kmods during runtime... or /etc/packagesremove.txt could be used to the same effect...
Nice. I'm unfamiliar with the concept of shifting pci-bus to the command line, could you please expand upon that? Yeah, WiFi seems a real problem. I use it as a minor sort of thing novelly due to the RPi's positioning, but performance under load is atrocious. It might be a product of RC3 support (or lack thereof), but taxing throughput via WiFi on a Macbook caused all kinds of weird errors and CPU load to skyrocket to a degree that seems outside of norms.
i'd elaborate if I really knew what I was talking about...
but as I don't... these are the options for looking up
for cmdline.txt
pci=pcie_bus_perf
for config.txt
dtparam=axiperf=on
only stumbled upon these after seeing the CM4 dtb is slightly different in that it beefs up pci-e bandwidth or something like that...
yeah... i've also seen OS-level performance GAINS and DROPS based on whats going on with wifi on this board... ( i.e. actually enabling it even if not using it seemed to increase overall OS function rather than leaving it in a disabled(uci) state )
Interesting, I'll certainly see if integrating one or both of those pci configs provides noticeable benefit when I've some spare time. That's damned peculiar with the WiFi, but such is the way of much of this as I've found through trial and error. How does it fare with respect to br-lan vs. not (eth0 instead)? Perhaps the software bridge provides some measure of benefit? Just spitballing as I've never disabled wlan, hence I've always had the br-lan interface.
funny you mention that... until yesterday I had no idea this could lead to possible perf gains...
but added it to my 'to-do' (move wan to br-wan) list after someone on another thread(board) mentioned this exact phenomenon... ( in their case... the software bridges performed better i believe )
Interesting. I've never thought about the prospect of adding WAN to a bridged interface, definitely would be keen to see your results as I've no idea on how to properly go about doing so myself as yet. Also, on the note of banIP, I've integrated the following default lists: firehol1, firehol2, firehol3, iblockads, iblockspy, threat, whitelist, yoyo for both WAN and Wireguard Client interface. At two weeks uptime I'd say I'm at least 100,000 hits from the IPSet report, just for a frame of reference.
my environment/load is probably not enough... but if there is any perf hit from banip...
it's typically as a result of the report feature (which I think you already mentioned need time to re-read and let it sink in)
so definately test with that off for a while... i'm not sure what/if any sysctls might assist if this is the case... maybe file:max_nr(probably not) or min_rsv(mem) aka settings for process memory allocation and pruning (otoh) might do something...?
generally tho... i'm running similar lists but less overall load on the router and I don't see noticable issues ( but I haven't really looked properly )...
(udp/conntrack sysctls may also be of gain for torrent style tests and uses)
there is some gremlin thats been bugging me though...
in that periodically... web pages may fail to load/router function hangs for a few seconds... it's infrequent but something i'm tracking (could be related to me giving masq more memory or cache-ttl)
have not really noticed this recently (last 5 or so builds) but after a reboot or new flash... responsiveness seems to magically speed up... so over time... something hits performance... likely candidate as above is the masq-cache settings or independant of masq... the ipset stack itself...
i'm not exactly 'taking-it-easy' with ipsets on this thing
Ah, sorry, I may have conveyed the wrong notion with how I worded it once again. I meant, instead, that banIP afforded a benefit rather than a detriment. I can only guess on the why and how with my current knowledge base, but my current leading presumption is that the explicit firewall rules (pertaining to their source) that immediately result in the drop rather than allowing for further probes is a net benefit as from my own usecase testing it has shown that the inclusion of banIP seems to mitigate the bizarre bufferbloat/throughput drop issue when applied in conjunction with the other changes I've mentioned.
And on your issue of DNSMasq issue I've never encountered such problems. However I instead use Adguard Home as my default DNS, binding to port 53 with DNSMasq on port 1053. The luci integration provides an interesting addition, https://github.com/kongfl888/luci-app-adguardhome/releases . It's also not immediately obvious, and it doesn't seem to jump out on the github, but it's definitely recommended to add something like "[/lan/arpa/]127.0.0.1:1053" to your upstream DNS. Where inside the brackets are the domains/TLDs to which the upstream server is applied, and of course the IP:Port is what your DNSMasq responds to. The general behaviour to be expected is that AGH automatically does so for PTR and local domain requests, but I found that it doesn't always follow so an explicit rule helps in this way.
Honestly I was quite surprised that banIP yielded noticeable benefit as I wouldn't have expected it to have impacted performance in such a way. I can only guess that stacking so many things together on OpenWrt makes the whole configuration a touch more sensitive when it comes to the minutiae of latency especially with the mess that TCP can be. Independently these things operate sufficiently, but their amalgamation has troubled me for some time. I'd wager that there are further improvements to be made on the router side, but quite frankly the testing is so damned taxing in effort and time.
I'll create a specified test image for a spare SD-Card with the inclusion of those packages as I'm a bit of a nut in terms of package inclusion, it's either included in install or not at all.
banip rules = keeps 'net-interrupt-handler/timer' 'awake' sort of thing... (or it's more of a pi-foundation 'we are still tweaking our stuff' thing... which is highly likely... but all the new stuff goes into 5.10)
i suspect the onboard regulator has some flaws/limitations... and it's plausible that the foundations hard power management defaults aren't just to save the environment... but to limit hangs and fallout in this regard...
(the perftweaks up/down thresh were pretty important to stop the thing 'throttle' stepping so much that's for sure but I can't say as to if/how much that would effect this. when I was looking into those I added the following to my cmdline.txt but have no idea if they do anything)
for this level of debugging... if you download this or the one next to it... open it in a browser... and zoom in near the network side of things... you can clearly see how much time certain function calls take within the stack... and doing the same with something on or off would likely yeild something tangible...
although they are pretty lowlevel to interpret or act on if you don't have C/low-level skills (me) lol....