Sorry that it took me a while to figure out how to setup sqm.
TLDR:
At 16 threads, SQM seem to max out around 200Mbps with minimal BufferBloat.
At 16 threads, Stock configuration seem to max out around 550Mbps with minimal BufferBloat.
At 16 threads, Hardware Acceleration configuration seem to max out around 650Mbps with bad buffer bloat. (DSL Reports)
Speedtest.net puts hardware acceleration at 900Mbps+
Here is the info and tests you were looking for:
16 threads http://www.dslreports.com/speedtest/44441321
24 threads http://www.dslreports.com/speedtest/44441245
root@OpenWrt:/etc# /etc/init.d/sqm stop
root@OpenWrt:/etc# /etc/init.d/sqm start
SQM: Starting SQM script: simple.qos on eth0.201, in: 950000 Kbps, out: 950000 Kbps
SQM: simple.qos was started on eth0.201 successfully
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
option enabled '1'
option qdisc 'fq_codel'
option script 'simple.qos'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0.201'
option download '950000'
option upload '950000'
option debug_logging '1'
option verbosity '5'
16-threads http://www.dslreports.com/speedtest/44441443
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
option enabled '1'
option script 'simple.qos'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0.201'
option download '950000'
option upload '950000'
option verbosity '5'
option debug_logging '0'
option qdisc 'cake'
16-threads http://www.dslreports.com/speedtest/44441521
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
option enabled '1'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0.201'
option download '950000'
option upload '950000'
option verbosity '5'
option debug_logging '0'
option qdisc 'cake'
option script 'layer_cake.qos'
16-threads http://www.dslreports.com/speedtest/44441560
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
option enabled '1'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0.201'
option download '950000'
option upload '950000'
option verbosity '5'
option debug_logging '0'
option qdisc 'cake'
option script 'simplest.qos'
OMG BAD!!!!
16-threads http://www.dslreports.com/speedtest/44441599
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
option enabled '1'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0.201'
option download '950000'
option upload '950000'
option verbosity '5'
option debug_logging '0'
option qdisc 'cake'
option script 'simplest_tbf.qos'
16-threads http://www.dslreports.com/speedtest/44441633
root@OpenWrt:/etc# cat /etc/config/sqm
config queue 'eth0'
option enabled '1'
option qdisc_advanced '0'
option linklayer 'none'
option interface 'eth0.201'
option download '950000'
option upload '950000'
option verbosity '5'
option debug_logging '0'
option qdisc 'cake'
option script 'piece_of_cake.qos'
Now running the "stock" OpenWRT with out any SQM
16-threads, NO hardware acceleration, NO SQM
http://www.dslreports.com/speedtest/44441698
16-threads, WITH hardware acceleration, NO SQM
http://www.dslreports.com/speedtest/44441763
My take away? If you are an internet connection that is 600Mbps, or less, just run the stock configuration and don't bother with hardware acceleration.
I would love to see someone, with more skills than I, start up a performance tuning thread for the ERX.