At this point, I'm not seeing much buffer bloat with un-shaped downstream, I might just leave cake to handle upstream and let downstream be what it is. We don't saturate downlink that often anyway.
A kernel performance bug (PR 11676) specific to the ipq806x targets was fixed at the end of September 2023. If you are not using the latest stable (23.05) version or a current snapshot, give that a try?
Also try installing irqbalance and enabling it (in the file /etc/config/irqbalance) and/or packet steering (Network>Interface>Global options tab in LuCI). irqbalance can be hit or miss on a dual core CPU, but if you are using your R7800 for both gateway and WiFi AP duty these options can help free up more CPU for SQM.
And, as you say, there are ACwifidude's NSS custom builds if nothing else works. These are fairly mature and even include 23.05 stable builds compatible with stable OpenWrt packages. In hindsight, I probably would have saved myself, or at least delayed by several years, a lot of time and hassle if I had gone done that rabbit hole instead of upgrading to WiFi6 devices.
In the low $100-$150 range, the EdgeRouter 4 is an attractive form factor combining a router, switch, internal power supply and even a POE port - if that form factor is a priority and future proofing is not, why not?
An ER4 wouldn't be my first choice though. The 1GHz MIPS64 in the Edgerouter 4 may struggle with SQM much above 300 Mbps, and even more so if CAKE (instead of fq_codel/simple) or VPN support is in the future. If I was getting new hardware, I would want something faster and ARMv8 based. Future ARM support is likely to be better than MIPS64 too I would guess.
I agree a NanoPi R4S is a good option. Any all-in-one with WiFi (e.g., your R7800) you hang off that as an AP can provide plenty of switch ports. For an all-in-one doing WiFi and gateway duty, I'd look at the few quad core Filogic options.
If you only get 300 Mbps on your down link without SQM, the answer to M10's question is "No, I have not set up SQM properly."
You need to carve off some of your maximum throughput to reserve some throughput for SQM to work. TANSTAAFL and all that.
The most throughput you should expect is around 88% to 92% of your maximum without SQM, so around 270 Mbps tops. Since you are setting your ingress to 300 Mbps, if you don't get a repeatable, stable ~333 Mbps throughput without SQM, setting ingress to 300 Mbps is just too high. It takes some experimentation to find the sweet spot: start low, and increase ingress speed until latency starts to increase, then back it back down again to before latency increased.
Keep in mind that sqm shaper rates are gross rates, while speed test results are net goodput, so for a normal docsis link with IPv4 the net throughput expected from any shaper rate is
100 * ((1500-20-20)/(1500+18)) = 96.2 %
so roughly 4% derated already...
+1; if you do that you can often take gross shaper rate = speedtest net throughput * 0.95 as starting point, however if you aim for not actually iterating over different settings, take a 0.9 factor and set the ingress keyword if you use cake (makes cake equalize its ingress rate instead of its egress rates, which tends to automatically adjust to more aggressive flows).
AFAICT the r7600 just can't do cake at 300mbps, at least under main branch. Looks like I could get fq_codel to perform better using custom builds but I'd rather stay on mainline. Looks like a nanopi r4s is in my future