SQM on Netgear R7800: choppy audio/video (Teksavvy)

hnyman · April 27, 2017, 7:45pm

Like I commented to your question in my own build thread, HTB used by simple and simplest seem to perform weakly in dual-core R7800, especially with kernel 4.4 that is used for ipq806x with LEDE 17.01.

The main reason for the weak "simple" performance with HTB+fq_codel seems actually to be HTB, not fq_codel itself. It is also possible to use "simplest_tbf" that avoids HTB by using TBF but still normally uses fq_codel. That simplest_tbf performed much better than simple (at least with kernel 4.4).

Extensive performance comparison of codel and cake qdiscs in R7800 can be found e.g. from https://github.com/tohojo/sqm-scripts/issues/48
A good summary maybe in https://github.com/tohojo/sqm-scripts/issues/48#issuecomment-270168000

The adoption of HTB burst in SQM and the move to kernel 4.9 has since then helped HTB/fq_codel performance somewhat (in LEDE master).

fantom-x · April 28, 2017, 5:05am

So, irqbalance has made a huge difference: now with heavy torrenting, the ping latency is ~20ms (vs 11ms ideal) at 45/9 (the link is 50/10). Setting SQM speed to 47.5/9.5 makes the latency go to 50..100ms and higher.

@moeller0, now that I am no longer CPU bound, how can I further improve SQM on my router? DO I still need to disable gro/lro/gso/tso ? My current SQM settings are below.

config queue 'wan'
	option interface 'pppoe-wan'
	option debug_logging '0'
	option verbosity '5'
	option linklayer 'ethernet'
	option qdisc_advanced '1'
	option qdisc_really_really_advanced '1'
	option iqdisc_opts 'nat dual-dsthost'
	option eqdisc_opts 'nat dual-srchost'
	option squash_dscp '1'
	option squash_ingress '1'
	option ingress_ecn 'ECN'
	option egress_ecn 'NOECN'
	option overhead '34'
	option enabled '1'
	option qdisc 'cake'
	option script 'piece_of_cake.qos'
	option download '45000'
	option upload '9000'

hnyman · April 28, 2017, 6:15am

Great that irqbalance helped you. But be cautious with it. Dissent1 noticed some problems if that was active quite at the boot. I did not achieve any magic improvement with it myself so I am normally not running it.

You should read the whole irqbalance discussion ending at Netgear R7800 exploration (IPQ8065, QCA9984) - #72 by hnyman

moeller0 · April 28, 2017, 12:15pm

Could you try to probe the thresholds for both directions independently, say start with 45/9 and start to increase the egress step-wise until you figure out between which two values latency increase under load starts to rise steeply, then repeat the same for ingress. With a bit of luck you will end up with a better feel for the trade-off you are selecting between bandwidth sacrifice and latency under load increase. Please note that for ingress shaping it might be worth wile to also test with multiple ingress streams as the shaper is more approximate and will show more bufferbloat with higher numbers of data flows. There is a development in cake that might make cake more independent on the number of concurrent flows (at a small cost of total throughput), watch for the "ingress" keyword to appear...

If bufferbloat is under control I would recommend to leave the off loads alone, a) cake AFAIK will segment giant packets to avoid too much lumpiness in dequeueing and b) techniques like GRO and GSO help your router better deal with high traffic situations.

Regarding your config, I would probably add "mpu 64 to both eqdisc_opts and iqdisc_opts. Also I would add option "linklayer_adaptation_mechanism 'default'", and if that does not work option linklayer_adaptation_mechanism 'cake' as otherwise mpu 64 will not work at all.

But first check whether mpu is listed in the output of:
"tc qdisc add root cake help"
if it does not give usage information for mpu refrain from adding it to the qdisc_opts...

I hope that helps

Best Regards

fantom-x · April 28, 2017, 2:52pm

Looks like I just need to copy your configuration from an earlier post. I will try that. Thx for the advice.

moeller0 · April 28, 2017, 3:40pm

Well, not really, you still should test out what bandwidth settings you are comfortable with, I believe the bufferbloat/bandwith sacrifice trade-off is a policy decision where every user will have a (slightly) different preferece. So just play around until you are happy. I just want to help getting there....

fantom-x · April 28, 2017, 4:04pm

Appreciate your help. Can you help with an example on how to do the above? I do not believe you have that in your configuration above.

moeller0 · April 28, 2017, 8:48pm

True, try:
option iqdisc_opts 'nat dual-dsthost mpu 64'
option eqdisc_opts 'nat dual-srchost mpu 64'

the rationale is that VDSL2 typically uses full ethernet frames including the FCS, and hence inherits ethernets minimum packet size of 64.

Best Regards

fantom-x · April 28, 2017, 11:41pm

Looks like I found the limit of my CPU: I cannot get more than 42M down no matter what download speed I configure above that value. The pings are 12..15..20ms (up from11ms) while heavy torrents are running while CPU utilization is at 60..75% across two cores.

Also pings remain at 11 ms while dslreport speed test is running with 32 concurrent downloads.

A steep price to pay (16% bandwidth) for improved latency, but hoping that newer versions will fix address this.

@moeller0, Thanks for you help.

These are my current settings if someone else is interested:

config queue 'wan'
	option debug_logging '0'
	option verbosity '5'
	option enabled '1'
	option interface 'pppoe-wan'
	option download '45000'
	option upload '9000'
	option linklayer 'ethernet'
	option overhead '34'
	option linklayer_advanced '1'
	option tcMTU '2047'
	option tcTSIZE '128'
	option tcMPU '64'
	option linklayer_adaptation_mechanism 'default'
	option qdisc 'cake'
	option script 'layer_cake.qos'
	option qdisc_advanced '1'
	option ingress_ecn 'ECN'
	option egress_ecn 'NOECN'
	option qdisc_really_really_advanced '1'
	option iqdisc_opts 'nat dual-dsthost mpu 64'
	option eqdisc_opts 'nat dual-srchost mpu 64'
	option squash_dscp '0'
	option squash_ingress '0'

fantom-x · April 29, 2017, 2:30am

Well, a correction is in order: the high CPU usage was caused by the torrent, not SQM. With regular 32-stream dslreport test I can get very close to 50M/10M while maintaining awesome ping latency and still having >80% of idle CPU. I am happy with this results. Thx again to everyone who helped along the way.

moeller0 · April 29, 2017, 7:27pm

Erm, you are running the torrent application on the router?

fantom-x · April 29, 2017, 7:56pm

No, not on the router. I have a Linux PC with two (1G) NICs connected to separate (1G) ports on the router. Torrents were running over one interface and pings were running over the other (two LXC containers). I had over 100 torrents of different ubuntu flavours downloading at the same time. I guess transmission opened so many sockets/streams that it put a huge strain on the router CPU.

fantom-x · April 29, 2017, 10:02pm

Oh, and the torrent traffic caused ~1,000 contexts switches per second on the router while normally the value is around 300 per second.

EDIT: Actually, I just noticed a spike to 6K context switches per sec on the chart...