CAKE is known to be a bit CPU heavy, especially on a fast line like yours. Try using fq-codel and see if the performance drop is as severe. It might also be good to compare the 1m load average before and after the speed test.
I see your running SQM on your IPv6 WAN connection, you might also need to setup a seperate SQM instance for your IPv4 WAN connection if applicable.
Since IPv6 is designed to not require NAT, I'm not sure if the ingress/egress options actually do anything. Unless you're doing something like NPTv6. I'd wait to hear from someone with more experience with SQM regarding that.
Can you also show the settings on the last tab (Link Layer Adaptation)? To get the most out of SQM, you'll need to set a proper overhead value.
I would expect the APU to do better than that. It should be good for around a gigabit i would think (j1900 can shape a gigabit and it's a similar era of cpu)
Log into the router and run top -d 1 then run your speed test and watch the cpu idle, how low does it go?
remind me, does the device have 2 cores? that would indicate potentially one core saturated. though you can monitor individual cores with an appropriate full version of top and find out... not sure if one is available on OpenWrt.
Yeah, it looks like it's saturating the 2nd core during the download at least briefly at different points during the test. I'd maybe suggest setting your download bandwidth to 300Mbps and running another test. You can leave your upload alone.
If you feel that this is not acceptable level of performance for you, I can heartily recommend switching to RPi4: RPi4 routing performance numbers
Mmmh, maybe this is an additional case where packet steering does not work as it should?
You could temporarily disable this by deleting /etc/hotplug.d/net/20-smp-packet-steering and rebooting the router (there should be a copy left in /rom/etc/hotplug.d/net/20-smp-packet-steering if you find you need it). I have a hunch that the redhat recommended settings for packet steering really do not work well on low-core-count non-x86 routers.... especially with traffic shaping adding a considerable load.
irq-balance might work better if packet steering is not enabled (not sure whether it was in your test).