I had the same issue and noticed the ksoftirqd/0 was running at 100% most of the time I did speed tests.
If you haven't done so try installing irqbalance and running it with the --oneshot
flag. Also you can enable the performance flag on the cpus and tweak some of the scaling behaviors to see if that improves things. The stock Orbi firmware sets the values in the qca-edma and powerctl init scripts.
Here's one of my better runs using iperf3 with 4 parallel streams:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 103 MBytes 86.8 Mbits/sec sender
[ 5] 0.00-10.17 sec 102 MBytes 84.5 Mbits/sec receiver
[ 7] 0.00-10.00 sec 137 MBytes 115 Mbits/sec sender
[ 7] 0.00-10.17 sec 135 MBytes 112 Mbits/sec receiver
[ 9] 0.00-10.00 sec 135 MBytes 113 Mbits/sec sender
[ 9] 0.00-10.17 sec 134 MBytes 110 Mbits/sec receiver
[ 11] 0.00-10.00 sec 46.9 MBytes 39.3 Mbits/sec sender
[ 11] 0.00-10.17 sec 46.4 MBytes 38.2 Mbits/sec receiver
[SUM] 0.00-10.00 sec 422 MBytes 354 Mbits/sec sender
[SUM] 0.00-10.17 sec 418 MBytes 344 Mbits/sec receiver
Reverse mode:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.02 sec 102 MBytes 85.1 Mbits/sec 183 sender
[ 5] 0.00-10.00 sec 100 MBytes 83.9 Mbits/sec receiver
[ 7] 0.00-10.02 sec 70.5 MBytes 59.1 Mbits/sec 100 sender
[ 7] 0.00-10.00 sec 68.7 MBytes 57.6 Mbits/sec receiver
[ 9] 0.00-10.02 sec 69.9 MBytes 58.5 Mbits/sec 156 sender
[ 9] 0.00-10.00 sec 69.0 MBytes 57.9 Mbits/sec receiver
[ 11] 0.00-10.02 sec 86.7 MBytes 72.6 Mbits/sec 101 sender
[ 11] 0.00-10.00 sec 85.2 MBytes 71.5 Mbits/sec receiver
[SUM] 0.00-10.02 sec 329 MBytes 275 Mbits/sec 540 sender
[SUM] 0.00-10.00 sec 323 MBytes 271 Mbits/sec receiver
There's a good thread on improving performance on the R7800 that's similar to some of the Orbi exploration I had done: