I seem to recall that @gwlim was a proponent of overclocking router CPUs (and in a sane fashion including stability tests); maybe the BB build you used was running the CPU at > 560 MHz?
So, I initially was distracted by the fact that idle goes down while the other go up under load, so my casual observations did not finger idle to be special until I started to pay attention to the details an realized that 0% idle actually does translate into the router has no CPU cycles to spare. That in itself is not so bad, but for that fact that "the router has actually far fewer CPU cycles available than it desires" has the exact same 0% idle "phenotype".
Ah, that is valuable information, in my limited tests I think saw some effect of running netperf on the router itself, but I did not actually research that any deeper after realizing that this was not testing what I intended to test, so thanks for the additional data point here.
I am somewhat sorry, that I will not be able to really help, that is maybe https://forum.openwrt.org/t/overclocking-router-devices/1298 has some pointer on how to overclock your wdr3600 as that might give you just enough additional cpu cycles to make sqm work at your bandwidth.
Best Regards