I have set up SQM and dslreport's speed test consistently gives me A+ for bufferbloat and everything else. Today I by accident ended up having a voice/video call while there was a single download happening on a another computer, which pretty much saturated my download link.
That resulted in a very choppy audio and video during the entire call. This might suggest I have made a configuration mistake. I would appreciate some input on my settings below (My link is VDSL2 50M/10M).
Interface name: pppoe-wan
Download speed: 47500
Upload speed: 9500
cake/piece_of_cake.qos
Ethernet with overhead
Per Packet Overhead: 34
@moeller0, here is the data you requested. Since I made that post I found out that I am missing 'nat dual-dsthost' and 'nat dual-srchost'. I have also run "ATM_overhead_detector" twice and it came back with 36 and 47 bytes overhead. I will run it a few more times to see if there is a pattern, but I guess my overhead is actually 36+8=44 and not 34.
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth1.3 root refcnt 2
qdisc noqueue 0: dev eth1.4 root refcnt 2
qdisc cake 8015: dev pppoe-wan root refcnt 2 bandwidth 9500Kbit besteffort triple-isolate rtt 100.0ms raw
linklayer ethernet overhead 34
qdisc ingress ffff: dev pppoe-wan parent ffff:fff1 ----------------
qdisc noqueue 0: dev wifi5 root refcnt 2
qdisc noqueue 0: dev wifi2.1 root refcnt 2
qdisc noqueue 0: dev wifi2.2 root refcnt 2
qdisc cake 8016: dev ifb4pppoe-wan root refcnt 2 bandwidth 47500Kbit besteffort triple-isolate wash rtt 100.0ms raw
linklayer ethernet overhead 34
Two more runs of ATM_overhead_detector gave me 33 and 48 bytes overhead (in addition to 36 and 47 the first two times). Now I am not sure which number to use here.
My modem(in bridge mode) shows "Mode: VDSL2" and the following for WAN:
Not really missing, it would just be a bit more predictable than the default triple-isolate (which in normal situations works quite well, but under speedtest conditions its behavior is not easily explained or predicted).
this together with:
makes sense, in that your link in all likelihood is not using ATM but rather PTM as is customary for VDSL2 (likelihood as it still is permissable to use ATM on VDSL2 links, I just hope no ISP will ever do this; I rather hope ISPs with start to do the also permitted opposite of usinf PTM on ADSL links, but I digress). The will only ever report trustworthy numbers for real ATM links (actually not ATM per se, but ATM/AAL5). But now you got my curiosity, could you maybe post the result pictures somewhere?
But if you carrier happens to be DTAG (deutsche telekom) I can guarantee 1526 byte frames over PTM, so on the ppppoe-wan interface you will need to specify 34 bytes, 8 bytes for the PPP/PPPoE Overhead and 26 Bytes for ethernet (VLAN) and PTM headers.
Also DTAG actually limits the bandwidth at the BRAS/BNG level so you need to set the shaper bandwidth relative to that unknown limit bandwidth and not the pure sync reported by the VDSL2-modem. The one set of numbers I got from an on-line discussion at https://www.onlinekosten.de/forum/showthread.php?p=2380059#post2380059 is:
Download: 48.400 Kibps (vertraglich: 51.300 kbit/s)
Upload: 9.470 Kibps (vertraglich: 10.000 kbit/s)
I would not put my hand into the fire for these numbers but they pretty much agree with my internal tests. Here is my current config (for a DTAG 50/10 @BRAS link):
config queue
option debug_logging '0'
option verbosity '5'
option enabled '1'
option interface 'pppoe-wan'
option download '46246'
option upload '9545'
option linklayer 'ethernet'
option overhead '34'
option linklayer_advanced '1'
option tcMTU '2047'
option tcTSIZE '128'
option tcMPU '64'
option linklayer_adaptation_mechanism 'default'
option qdisc 'cake'
option script 'layer_cake.qos'
option qdisc_advanced '1'
option ingress_ecn 'ECN'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option iqdisc_opts 'nat dual-dsthost'
option eqdisc_opts 'nat dual-srchost'
option squash_dscp '0'
option squash_ingress '0'
I probably should add "mpu 64" to both i- and eqdisc_opts...
Regarding your initial problem, could you test to set sqm to say 25000/5000 and restest whether the choppyness is still there? If yes this would indicate to low safety margins in your shaper settings, if not it would indicate additional issues. BTW for sqm testing it would be best if all test computers (the downloader and the VoIP device) are connected vie wired-lan ports, coppyness can easily be created by wifi links so it would be great if you could rule this out (I am not saying that wifi is the culprit in any way, only that wifi links add loads of variability that makes finding the root cause not any easier)...
that means that somehow your system assembles giant packets (basically meta packets consisting out of a bunch of normal packets belonging to the same flow which the kernel treats as one big packet to ameliorate the costs involved in routing lookups). This generally is great as it will allow the kernel to deal with higher speeds but in your case it seems less than ideal as it will cause roughly (44238) / (95001000) = 0.00372463157895 or 3 millisecond serialization delay. If you ever see larger giants the delay will increase. Again, I am not saying that this causes your choppiness issue, but it will introduce additional variance... We often see giant packets in ingress and then disabling GRO on the ethernet interface below ifb4pppoe-wan would be sufficient, but for egress you would need to disable GRO for LAN and wifi interfaces. Ad that is somewhat unfortunate as it would incur move processing cost on your router's cpu...
As stated in my delayed response, it is highly unlikely that your link uses ATM/AAL5 and ATM_overhead_detector relies on ATM cell quantization to estimate the per packet overhead, so no ATM cells no useful overhead estimate. Hence the ATM in the name, but please post the two figures ATM_overhead_detector creates for each run somewhere and I can look into the details (and maybe improve the output)
Ah, okay, so that is not the staircase you have been looking for, the likely unlikely test simply compares the "residuals" between the data and the two fitted functions and declares the the function with the lower residuals as likely. This admittedly is rather coarse. So in your case as expected from a VDSL2 link do not assume that you have an atm carrirer (and if you truly believe you have, you should at least to select atm instead of ethernet as linklayer). BTW there is another quick and dirty test for an ATM carrier, if you do a speedtest without a shaper and you ged a goodput not slower than 90% of the sync rate you do not have an ATM link (as ATMs 48/53 encoding has roughly 10% overhead)
At this point I would really recommend setting sqm to 25/5 and see whether the choppyness still is there and then to iteratively relax the shaper settings first for egress and then ingress to figure out how high you can go with the choppiness still acceptable.
so since you are using pppoe and instantiate sq on pppoe-wan you will need to add at least the 8 bytes for PPPoE, but due to the oE I assume you will also need the ethernet header plus framecheck sequence, so add 6+6+2+4 = 18 since teksavvy (according to your screenshot above) uses a VLAN add 4 more bytes for a total of 8+18+4 = 30 for the full applicable ethernet frame, and then add the missing 4 bytes for VDSL2's PTM:
VDSL2 (IEEE 802.3-2012 61.3 VDSL2 with PPPoE and VLAN tag):
2 Byte PPP + 6 Byte PPPoE + 4 Byte VLAN + 1 Byte Start of Frame (S), 1 Byte End of Frame (Ck), 2 Byte TC-CRC (PTM-FCS), = 16 Byte
Again, I am not saying that wifi is the cause for the choppiness, but it certainly will make tests more repeatable if you connect all machines via wired ports. Not only does wifi have variable delay issues depending on your RF-surround (which since the "ether" is a shared medium is not fully under your control) but it can also put a considerable load on your router's CPU which can also lead to intermittent latency increases that could manifest as choppiness.
Ok, so I was able to run a few more tests with overhead==34 and D/U limit at 25/5 and 47.5/9.5 with the same results as before. I stopped collectd just in case, but did not help either. While performing those tests I was running bittorrent with a lot of ubuntu images and it saturated the link. Interestingly enough, that made no noticable difference to the quality of the voice/video call: the same choppy, but still quite tolerable.
The next thing I need to do is to get this laptop on a wired connection and wait for another voice/video call, but that will take a while.
Thank you for your comments and suggestions so far.
Hmm, I have just noticed that max_length is > 50K, so I disabled GRO on all interfaces as per your suggestion. Now max_len is constantly at 1526, but I am not close to testing a voice/video call.
What I am testing is "ping google.com" while downloading lots of ubuntu torrents. Normally the pings are 11 ms, but with torrents they are mostly 15..35 ms and very rarely jumping to 70 ms or so. Without SQM the pings would be 150ms and higher.
The connections are wired and both are connected to different ports.
Okay, so in theory I would expect at full saturation on average an added delay equal to the sum of the target values from cake's statistics, that would add up to 10ms. But in practice I often see more like double of the target sum, so that would be 2 * (5 + 5) = 20ms in your case and then the observed 15 to 35 seems in the right ball park. So that seems not great but okay (especially ince cake when the cpu is overburdened will keep the bandwidth up at the cost of a little added latency under load, HTB+fq_codel as in simplest.qos will keep the latency low at the cost of reduced bandwidth under load). So you could test this by trying simplest.qos+fq_codel...
In addition you might want to log into your rputer via SSH while you run your tests and look at the output of "top -d 1" which will give you an snapshot of your router's load every second. If idle hits zero or constantly hovers near zero you might be CPU cycle limited (in that case I would also expect the sirq value to be relatively high). But at 50/10 most not too old router's should cope one would hope...
Thx for the hint. Unfortunately simplest.qos+fq_codel provides worse latency: at 25/5 it is over 20ms while with cake it is 12..13 ms.
This router has a a dual core CPU at 1.7GHz and at 25/5 "top" reports >70% idle. At 45/9 it is ~50% idle and 35..50% sirq. It starts being single-core bound around 40/8 or less: I guess SQM is running on a single core?
At 35/7 latency drops to ~15..20ms again (with torrents and GRO disabled) with >60% CPU idle.
At 30/6 ping latency is < 15 ms and 70% CPU idle, 25% sirq.
I did not realize SQM would be so CPU intensive and this router has one of the most powerful CPUs...
Thx for the hint. Unfortunately simplest.qos+fq_codel provides worse latency: at 25/5 it is over 20ms while with cake it is 12..13 ms.
This router has a a dual core CPU at 1.7GHz and at 25/5 "top" reports >70% idle. At 45/9 it is ~50% idle and 35..50% sirq. It starts being single-core bound around 40/8 or less: I guess SQM is running on a single core?
At 35/7 latency drops to ~15..20ms again (with torrents and GRO disabled) with >60% CPU idle.
At 30/6 ping latency is < 15 ms and 70% CPU idle, 25% sirq.
50% idle probably means that one core is maxed out and the other completely idle
(in top, hit '1' to have it show each core separately), even 60% idle is pretty
tight.
It's very possible that you are running out of cpu here.