SQM/QoS can saturate the CPU/is this expected or can the code be improved?

No idea... as an update, I am finding that the NSS code @quarky @Ansuel (and others?) made available to the community preforms as-good-as SQM with the endpoint of bufferbloat numbers yet allows for faster download speeds on WiFi devices.

See here for key steps to build the NSS code (assuming you have a compatible device). See here for setup of fq_codel on it. You do not use the LuCI package. That thread is pretty large.