I did some testing while trying to get the best SQM performance, you can check out the thread here Need help load balancing SQM on NanoPi R5c
It's not exactly relevant to what you're trying to achieve but it gives you some idea on how to manually balance load on different cores
A lot of the info I found was from others trying to get better performance out of the R4S which has a heterogeneous core layout so they were trying to pin the SQM queues on the faster cores. This wiki has some good information on load balancing in general https://openwrt.org/docs/guide-user/advanced/load_balancing_-_tuning_smp_irq
I just did some more testing and I think the interrupts for each network interface is single threaded so I doubt you'll be able to get more performance than what you're seeing unfortunately. With SQM it was at least possible to keep the network queues on the underutilized cores.