Heh. Just having baseline performance figures for this hardware also on this thread would be good, so no need to fork it. You've now shown that this device could - if it was working right - drive the wifi to saturation well past 40Mbit - well past 200Mbit and it isn't, due to yet some other problem we have not found yet. On some other bug thread here are people reporting problems with "ax" mode, try ac?
On the ethernet front...
My guess is that the APU2 has 4 hardware queues and you don't have irqbalance installed. (a "tc -s show dev the_lan_network_device" would show mq + 4 instances of fq_codel). In this test two flows landed in one hardware queue, another ended up well mapped to the right cpu, the other less so. tc -s qdisc show on your fedora box will probably also so mqs + fq_codel
A test of the lan ethernet device with just fq_codel on it (tc replace dev the_lan_device) will probably show the downloads achieving parity between each other but not a full gbit.
The icmp induced latency looks to be about right (it's usually ~500us) the induced udp latency of > 5ms surprisingly high. I'd suspect TSO/GRO. Trying cake on the lan interface (without mq but with the gso-split option) - would probably cut that (due to cutting BQL size). There are numerous other subsystems in play like TSQ.
Trying 4 instances of cake with gso-split on mq would also be intersting.
The world keeps bulking up things on us. So much of linux's development is on really high end boxes and some recent modifications like running more of the stack on rx looked good on machines with large caches but I suspect hurt on everything else.
What does openwrt use for RT_PREEMPT and clock ticks these days?