I got a nice suggestion from tohojo at the SQM scripts development repo, where I opened an issue about this. It may actually be how the HTB (hierarchical token bucket) qdisc perfoms on this CPU...
Well in that case it sounds like it has something to do with the way HTB runs on that CPU. You could try if TBF has the same behaviour; enable sqm, then issue the following commands to replace the configured qdiscs with a TBF-based one (provided TBF is in LEDE; not sure if it is):
tc qdisc del dev eth0 root
tc qdisc del dev ifb4eth0 root
tc qdisc add dev eth0 handle 1: root tbf rate 8Mbit burst 15140 latency 100ms
tc qdisc add dev ifb4eth0 handle 1: root tbf rate 90Mbit burst 15140 latency 100ms
tc qdisc add dev eth0 handle 2: parent 1:1 fq_codel
tc qdisc add dev ifb4eth0 handle 2: parent 1:1 fq_codel
I tested that suggestion and did a quick and dirty Ookla speedtest and the resulting change is impressive:
simple with original HTB: 77 / 7 Mbit/s
simple with TBF (above): 85 / 8 Mbit/s
So, an immediate jump of 8 Mbit/s in the download speed and now the throughput is much closer to the set limit of 90 / 8. HTB might be culprit (but what is the ultimate reason for the bad performance, no idea, yet)