Wish this would see more development ...
I found couple of interesting threads:
I have to admit that I have no reliable information to share. However as far as I understand, software offloading should work with SQM as it tries to avoid other parts of the network stack higher up than mere qdiscs like sqm uses. Hardware flow offloading however will hide all packets from sqm, so the hardware offloading engine needs to offer its own qdiscs, like the NSS stuff on e.g. r7800 as far as I understand.
It's working if the SQM interface set as wan (device), but software flow offloading does not help with SQM if the interface set as pppoe-wan (tunnel). My test on my friend's FTTH line with 200/20 plan showed that Xiaomi Mi Router 4A Gigabit Edition is able to shape up to 150mbps using CAKE without software flow offloading and it's fine doing 200mbps with software flow offloading turned on. But I'm afraid Diffserv is not working when shaping is done on wan instead of pppoe-wan, every packet goes …
Since kernel 5.12 HTB qdisc has an API for hardware offload. Mellanox is using this feature.
Would it make sense to try and implement this for some of our targets? Using Hardware HTB in combination with software fq_codel could allow us to do SQM at higher rates?
I am not a developer but if there is any reading this , what do you think about this?
We are approaching territory where SQM becomes impossible on all but highest end hardware... You need a beast of a router to work with gigabit ethernet (Let's not even talk about higher speeds - 5 or even 10 gigabit, straight up impossible)