SQM on Netgear R7800: choppy audio/video (Teksavvy)

fantom-x · April 26, 2017, 9:25pm

I have set up SQM and dslreport's speed test consistently gives me A+ for bufferbloat and everything else. Today I by accident ended up having a voice/video call while there was a single download happening on a another computer, which pretty much saturated my download link.

That resulted in a very choppy audio and video during the entire call. This might suggest I have made a configuration mistake. I would appreciate some input on my settings below (My link is VDSL2 50M/10M).

Interface name: pppoe-wan
Download speed: 47500
Upload speed: 9500
cake/piece_of_cake.qos
Ethernet with overhead
Per Packet Overhead: 34

moeller0 · April 27, 2017, 6:32am

Hi fantom-x,

culd you please post the output of:
tc -d qdisc
tc -s qdisc
cat /etc/config/sqm

that should give a good detailed view into what is configured and active on your system.

Best Regards

fantom-x · April 27, 2017, 12:13pm

@moeller0, here is the data you requested. Since I made that post I found out that I am missing 'nat dual-dsthost' and 'nat dual-srchost'. I have also run "ATM_overhead_detector" twice and it came back with 36 and 47 bytes overhead. I will run it a few more times to see if there is a pattern, but I guess my overhead is actually 36+8=44 and not 34.

qdisc noqueue 0: dev lo root refcnt 2 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
qdisc noqueue 0: dev br-lan root refcnt 2 
qdisc noqueue 0: dev eth1.3 root refcnt 2 
qdisc noqueue 0: dev eth1.4 root refcnt 2 
qdisc cake 8015: dev pppoe-wan root refcnt 2 bandwidth 9500Kbit besteffort triple-isolate rtt 100.0ms raw 
 linklayer ethernet overhead 34 
qdisc ingress ffff: dev pppoe-wan parent ffff:fff1 ---------------- 
qdisc noqueue 0: dev wifi5 root refcnt 2 
qdisc noqueue 0: dev wifi2.1 root refcnt 2 
qdisc noqueue 0: dev wifi2.2 root refcnt 2 
qdisc cake 8016: dev ifb4pppoe-wan root refcnt 2 bandwidth 47500Kbit besteffort triple-isolate wash rtt 100.0ms raw 
 linklayer ethernet overhead 34

qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
 Sent 1835358534 bytes 15471767 pkt (dropped 0, overlimits 0 requeues 11) 
 backlog 0b 0p requeues 11 
  maxpacket 1514 drop_overlimit 0 new_flow_count 5275 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
 Sent 42715335773 bytes 31490006 pkt (dropped 0, overlimits 0 requeues 289) 
 backlog 0b 0p requeues 289 
  maxpacket 64341 drop_overlimit 0 new_flow_count 511604 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc noqueue 0: dev eth1.3 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc noqueue 0: dev eth1.4 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc cake 8015: dev pppoe-wan root refcnt 2 bandwidth 9500Kbit besteffort triple-isolate rtt 100.0ms raw 
 Sent 752490 bytes 5704 pkt (dropped 0, overlimits 368 requeues 0) 
 backlog 0b 0p requeues 0 
 memory used: 12160b of 4Mb
 capacity estimate: 9500Kbit
                 Tin 0
  thresh      9500Kbit
  target         5.0ms
  interval     100.0ms
  pk_delay       234us
  av_delay        18us
  sp_delay         5us
  pkts            5704
  bytes         752490
  way_inds         247
  way_miss         246
  way_cols           0
  drops              0
  marks              0
  sp_flows           1
  bk_flows           1
  un_flows           0
  max_len         4423

qdisc ingress ffff: dev pppoe-wan parent ffff:fff1 ---------------- 
 Sent 9265128 bytes 8214 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc noqueue 0: dev wifi5 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc noqueue 0: dev wifi2.1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc noqueue 0: dev wifi2.2 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc cake 8016: dev ifb4pppoe-wan root refcnt 2 bandwidth 47500Kbit besteffort triple-isolate wash rtt 100.0ms raw 
 Sent 9544404 bytes 8214 pkt (dropped 0, overlimits 6112 requeues 0) 
 backlog 0b 0p requeues 0 
 memory used: 44Kb of 4Mb
 capacity estimate: 47500Kbit
                 Tin 0
  thresh     47500Kbit
  target         5.0ms
  interval     100.0ms
  pk_delay       1.6ms
  av_delay       446us
  sp_delay         6us
  pkts            8214
  bytes        9544404
  way_inds         145
  way_miss         236
  way_cols           0
  drops              0
  marks              0
  sp_flows           1
  bk_flows           3
  un_flows           0
  max_len         1526

config queue 'eth1'
	option qdisc_advanced '0'
	option interface 'pppoe-wan'
	option download '47500'
	option upload '9500'
	option debug_logging '0'
	option verbosity '5'
	option linklayer 'ethernet'
	option overhead '34'
	option qdisc 'cake'
	option script 'piece_of_cake.qos'
	option enabled '1'

fantom-x · April 27, 2017, 1:37pm

Two more runs of ATM_overhead_detector gave me 33 and 48 bytes overhead (in addition to 36 and 47 the first two times). Now I am not sure which number to use here.

My modem(in bridge mode) shows "Mode: VDSL2" and the following for WAN:

Interface 	Description 	Type 	Vlan8021p 	VlanMuxId 	Igmp 		NAT 		Firewall 	IPv6 		Mld
atm0.1 		br_4_0_35 	Bridge 	N/A 		N/A 		Disabled 	Disabled 	Disabled 	Disabled 	Disabled 		
ptm0.1 		br_4_1_1.35 	Bridge 	1 		35 		Disabled 	Disabled 	Disabled 	Disabled 	Disabled

moeller0 · April 27, 2017, 2:07pm

Not really missing, it would just be a bit more predictable than the default triple-isolate (which in normal situations works quite well, but under speedtest conditions its behavior is not easily explained or predicted).

this together with:

makes sense, in that your link in all likelihood is not using ATM but rather PTM as is customary for VDSL2 (likelihood as it still is permissable to use ATM on VDSL2 links, I just hope no ISP will ever do this; I rather hope ISPs with start to do the also permitted opposite of usinf PTM on ADSL links, but I digress). The will only ever report trustworthy numbers for real ATM links (actually not ATM per se, but ATM/AAL5). But now you got my curiosity, could you maybe post the result pictures somewhere?

But if you carrier happens to be DTAG (deutsche telekom) I can guarantee 1526 byte frames over PTM, so on the ppppoe-wan interface you will need to specify 34 bytes, 8 bytes for the PPP/PPPoE Overhead and 26 Bytes for ethernet (VLAN) and PTM headers.
Also DTAG actually limits the bandwidth at the BRAS/BNG level so you need to set the shaper bandwidth relative to that unknown limit bandwidth and not the pure sync reported by the VDSL2-modem. The one set of numbers I got from an on-line discussion at https://www.onlinekosten.de/forum/showthread.php?p=2380059#post2380059 is:
Download: 48.400 Kibps (vertraglich: 51.300 kbit/s)
Upload: 9.470 Kibps (vertraglich: 10.000 kbit/s)
I would not put my hand into the fire for these numbers but they pretty much agree with my internal tests. Here is my current config (for a DTAG 50/10 @BRAS link):
config queue
option debug_logging '0'
option verbosity '5'
option enabled '1'
option interface 'pppoe-wan'
option download '46246'
option upload '9545'
option linklayer 'ethernet'
option overhead '34'
option linklayer_advanced '1'
option tcMTU '2047'
option tcTSIZE '128'
option tcMPU '64'
option linklayer_adaptation_mechanism 'default'
option qdisc 'cake'
option script 'layer_cake.qos'
option qdisc_advanced '1'
option ingress_ecn 'ECN'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option iqdisc_opts 'nat dual-dsthost'
option eqdisc_opts 'nat dual-srchost'
option squash_dscp '0'
option squash_ingress '0'

I probably should add "mpu 64" to both i- and eqdisc_opts...

Regarding your initial problem, could you test to set sqm to say 25000/5000 and restest whether the choppyness is still there? If yes this would indicate to low safety margins in your shaper settings, if not it would indicate additional issues. BTW for sqm testing it would be best if all test computers (the downloader and the VoIP device) are connected vie wired-lan ports, coppyness can easily be created by wifi links so it would be great if you could rule this out (I am not saying that wifi is the culprit in any way, only that wifi links add loads of variability that makes finding the root cause not any easier)...

The statustics look fine, but could you repeat the "tc -s qdisc" directly before and after a speedtest run (have a look at https://forum.openwrt.org/t/sqm-qos-recommended-settings-for-the-dslreports-speedtest-bufferbloat-testing/2803 for how to get the most out of the dslreports speedtest to stress sqm).

From your egress cake stats I see:

that means that somehow your system assembles giant packets (basically meta packets consisting out of a bunch of normal packets belonging to the same flow which the kernel treats as one big packet to ameliorate the costs involved in routing lookups). This generally is great as it will allow the kernel to deal with higher speeds but in your case it seems less than ideal as it will cause roughly (44238) / (95001000) = 0.00372463157895 or 3 millisecond serialization delay. If you ever see larger giants the delay will increase. Again, I am not saying that this causes your choppiness issue, but it will introduce additional variance... We often see giant packets in ingress and then disabling GRO on the ethernet interface below ifb4pppoe-wan would be sufficient, but for egress you would need to disable GRO for LAN and wifi interfaces. Ad that is somewhat unfortunate as it would incur move processing cost on your router's cpu...

Best Regards

moeller0 · April 27, 2017, 2:11pm

As stated in my delayed response, it is highly unlikely that your link uses ATM/AAL5 and ATM_overhead_detector relies on ATM cell quantization to estimate the per packet overhead, so no ATM cells no useful overhead estimate. Hence the ATM in the name, but please post the two figures ATM_overhead_detector creates for each run somewhere and I can look into the details (and maybe improve the output)

Best Regards

fantom-x · April 27, 2017, 2:28pm

Here are the images. I am gonna need more time to digest your other reply. Thanks for the input.

moeller0 · April 27, 2017, 2:51pm

Ah, okay, so that is not the staircase you have been looking for, the likely unlikely test simply compares the "residuals" between the data and the two fitted functions and declares the the function with the lower residuals as likely. This admittedly is rather coarse. So in your case as expected from a VDSL2 link do not assume that you have an atm carrirer (and if you truly believe you have, you should at least to select atm instead of ethernet as linklayer). BTW there is another quick and dirty test for an ATM carrier, if you do a speedtest without a shaper and you ged a goodput not slower than 90% of the sync rate you do not have an ATM link (as ATMs 48/53 encoding has roughly 10% overhead)

At this point I would really recommend setting sqm to 25/5 and see whether the choppyness still is there and then to iteratively relax the shaper settings first for egress and then ingress to figure out how high you can go with the choppiness still acceptable.

Best Regards

fantom-x · April 27, 2017, 2:52pm

@moeller0, what overhead should i be using for more tests: 8 or something else?

My provider is Teksavvy and I was using a WiFi laptop for the voice/video chat. I will need to find an adaptor to connect it a wired network.

fantom-x · April 27, 2017, 2:57pm

I am getting around 95% of the sync rate, so no ATM here.

moeller0 · April 27, 2017, 3:02pm

Hi fantom-x,

so since you are using pppoe and instantiate sq on pppoe-wan you will need to add at least the 8 bytes for PPPoE, but due to the oE I assume you will also need the ethernet header plus framecheck sequence, so add 6+6+2+4 = 18 since teksavvy (according to your screenshot above) uses a VLAN add 4 more bytes for a total of 8+18+4 = 30 for the full applicable ethernet frame, and then add the missing 4 bytes for VDSL2's PTM:

VDSL2 (IEEE 802.3-2012 61.3 VDSL2 with PPPoE and VLAN tag):
2 Byte PPP + 6 Byte PPPoE + 4 Byte VLAN + 1 Byte Start of Frame (S), 1 Byte End of Frame (Ck), 2 Byte TC-CRC (PTM-FCS), = 16 Byte

Residual Ethernetheader:
4 Byte Frame Check Sequence (FCS) + 6 (dest MAC) + 6 (src MAC) + 2 (ethertype) = 18 byte

Total: 18+16 = 34 Bytes.

Again, I am not saying that wifi is the cause for the choppiness, but it certainly will make tests more repeatable if you connect all machines via wired ports. Not only does wifi have variable delay issues depending on your RF-surround (which since the "ether" is a shared medium is not fully under your control) but it can also put a considerable load on your router's CPU which can also lead to intermittent latency increases that could manifest as choppiness.

Best Regards

fantom-x · April 27, 2017, 3:47pm

Ok, so I was able to run a few more tests with overhead==34 and D/U limit at 25/5 and 47.5/9.5 with the same results as before. I stopped collectd just in case, but did not help either. While performing those tests I was running bittorrent with a lot of ubuntu images and it saturated the link. Interestingly enough, that made no noticable difference to the quality of the voice/video call: the same choppy, but still quite tolerable.
The next thing I need to do is to get this laptop on a wired connection and wait for another voice/video call, but that will take a while.
Thank you for your comments and suggestions so far.

fantom-x · April 27, 2017, 5:07pm

moeller0:

max_len 4423

that means that somehow your system assembles giant packets (basically meta packets consisting out of a bunch of normal packets belonging to the same flow which the kernel treats as one big packet to ameliorate the costs involved in routing lookups). This generally is great as it will allow the kernel to deal with higher speeds but in your case it seems less than ideal as it will cause roughly (44238) / (95001000) = 0.00372463157895 or 3 millisecond serialization delay. If you ever see larger giants the delay will increase. Again, I am not saying that this causes your choppiness issue, but it will introduce additional variance... We often see giant packets in ingress and then disabling GRO on the ethernet interface below ifb4pppoe-wan would be sufficient, but for egress you would need to disable GRO for LAN and wifi interfaces. Ad that is somewhat unfortunate as it would incur move processing cost on your router's cpu...

Hmm, I have just noticed that max_length is > 50K, so I disabled GRO on all interfaces as per your suggestion. Now max_len is constantly at 1526, but I am not close to testing a voice/video call.

What I am testing is "ping google.com" while downloading lots of ubuntu torrents. Normally the pings are 11 ms, but with torrents they are mostly 15..35 ms and very rarely jumping to 70 ms or so. Without SQM the pings would be 150ms and higher.
The connections are wired and both are connected to different ports.

Are two to three times longer pings expected?

moeller0 · April 27, 2017, 5:26pm

Okay, so in theory I would expect at full saturation on average an added delay equal to the sum of the target values from cake's statistics, that would add up to 10ms. But in practice I often see more like double of the target sum, so that would be 2 * (5 + 5) = 20ms in your case and then the observed 15 to 35 seems in the right ball park. So that seems not great but okay (especially ince cake when the cpu is overburdened will keep the bandwidth up at the cost of a little added latency under load, HTB+fq_codel as in simplest.qos will keep the latency low at the cost of reduced bandwidth under load). So you could test this by trying simplest.qos+fq_codel...

In addition you might want to log into your rputer via SSH while you run your tests and look at the output of "top -d 1" which will give you an snapshot of your router's load every second. If idle hits zero or constantly hovers near zero you might be CPU cycle limited (in that case I would also expect the sirq value to be relatively high). But at 50/10 most not too old router's should cope one would hope...

Best Regards

fantom-x · April 27, 2017, 6:00pm

Thx for the hint. Unfortunately simplest.qos+fq_codel provides worse latency: at 25/5 it is over 20ms while with cake it is 12..13 ms.

This router has a a dual core CPU at 1.7GHz and at 25/5 "top" reports >70% idle. At 45/9 it is ~50% idle and 35..50% sirq. It starts being single-core bound around 40/8 or less: I guess SQM is running on a single core?
At 35/7 latency drops to ~15..20ms again (with torrents and GRO disabled) with >60% CPU idle.
At 30/6 ping latency is < 15 ms and 70% CPU idle, 25% sirq.

I did not realize SQM would be so CPU intensive and this router has one of the most powerful CPUs...

dlang · April 27, 2017, 6:14pm

Thx for the hint. Unfortunately simplest.qos+fq_codel provides worse latency: at 25/5 it is over 20ms while with cake it is 12..13 ms.

This router has a a dual core CPU at 1.7GHz and at 25/5 "top" reports >70% idle. At 45/9 it is ~50% idle and 35..50% sirq. It starts being single-core bound around 40/8 or less: I guess SQM is running on a single core?
At 35/7 latency drops to ~15..20ms again (with torrents and GRO disabled) with >60% CPU idle.
At 30/6 ping latency is < 15 ms and 70% CPU idle, 25% sirq.

50% idle probably means that one core is maxed out and the other completely idle
(in top, hit '1' to have it show each core separately), even 60% idle is pretty
tight.

It's very possible that you are running out of cpu here.

David Lang

fantom-x · April 27, 2017, 6:19pm

'1' does not seem to be implemented, but yeah looks like it is getting single-core bound pretty quickly.

dlang · April 27, 2017, 6:43pm

that's the number 1, not the letter L

David Lang

fantom-x · April 27, 2017, 6:53pm

Yes, I know. Does it work for you on the a LEDE router? I found lots of standard Linux tools have limited functionality on routers.

dlang · April 27, 2017, 7:11pm

I may be compiling in a full version of top instead of a busibox version or
something like that.
k