CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

@bmork I wonder whether you can offer any insight into 4G/5G in respect of how download and/or upload saturation is expected to affect one way delays (measurable using ICMP type 13 or irtt) in both the download and upload directions?

On my connection I am seeing a noticeable increase in upload one way delay on saturating download, whereas on saturating upload there is not a similar increase in download one way delay.

Does this make sense to you?

This seems odd because ordinarily download saturation results in primarily only an increase in download OWD and upload saturation results in primarily only an increase in upload OWD. Or at least that was my understanding up until now. @tievolu is that what you see on your connection?

Here is download followed by upload:

Here is download followed by upload followed by a mixture of download and upload:

Yes, this is what I normally see - i.e. exactly what you would expect to see. If you're not seeing that I suspect there's some other issue with your connection.

Right now there's a problem with my cable connection that is impacting the upstream signal, which in turn is restricting my upstream bandwidth to about 4-6Mbps, so the acks associated with the ~300Mbps downstream are sometimes enough to choke the upstream.

Definitely set the ack filter on your cake shaper. This should help a bunch.

Yeah, I use ack-filter on the upstream already, so it would presumably be even worse without it :disappointed:

I'm hoping to ditch Virgin Media soon. Cityfibre should be digging up my road to lay fibre later this month :+1:

Thanks @tievolu I was also wondering about the possible effect of the ack filter in my situation. What does the ack filter do again / how does it work?

@moeller0 any clue why I'm seeing video stutter on download in Teams? I made a log export actually during the call. I'll run that through your plotter and post on this post.

cake's ACK filter will treat ACKs like all other packets and hash them into its (by default) 1024 buckets, but if an ACK is added to a bucket that already contains ACK to the same target IP/port combination, it will check whether the ACKs are mergeable and if they are, the earlier compatible ACKs in the queue will be replaced by the new ACK. This works as ACKs are cumulative, that is the ACK with the highest sequence number is the more important one as the other end will essentially do the same thing if ACN(n) and ACK(n++1) are delivered back-to-back or if ACK(n) is not delivered at all.

IMHO ACK filtering is a bit of a crutch, and the better solution for over-verbose ACKs would be to fix the end-point TCPs and to use less asymmetric links, but that often isn't a real option and so haven ACK filtering as an option for cake is really great!

So it actually means dropping some acks to free up space for other stuff? Would it help in my situation?

@moeller0 does this reveal anything about why Teams video download was stuttery (call began at 17:00)?

Timecourse:

Raw CDFs (not sure why bottom gets clipped off):

Delta CDFs (ditto):

Samples per reflector:
ReflectorID: 8.8.4.4; N: 2111
ReflectorID: 8.8.8.8; N: 2138
ReflectorID: 94.140.14.140; N: 2133
ReflectorID: 94.140.14.141; N: 2113
ReflectorID: 94.140.15.15; N: 2162
ReflectorID: 94.140.15.16; N: 2112
DL: maximum 95.000%-ile delta delay over all 6 reflectors: 9.820 ms.
DL: maximum 99.000%-ile delta delay over all 6 reflectors: 12.570 ms.
DL: maximum 99.500%-ile delta delay over all 6 reflectors: 14.305 ms.
DL: maximum 99.900%-ile delta delay over all 6 reflectors: 18.050 ms.
DL: maximum 99.950%-ile delta delay over all 6 reflectors: 20.605 ms.
DL: maximum 99.990%-ile delta delay over all 6 reflectors: 68.630 ms.
DL: maximum 99.999%-ile delta delay over all 6 reflectors: 68.630 ms.
UL: maximum 95.000%-ile delta delay over all 6 reflectors: 9.820 ms.
UL: maximum 99.000%-ile delta delay over all 6 reflectors: 12.570 ms.
UL: maximum 99.500%-ile delta delay over all 6 reflectors: 14.305 ms.
UL: maximum 99.900%-ile delta delay over all 6 reflectors: 18.050 ms.
UL: maximum 99.950%-ile delta delay over all 6 reflectors: 20.605 ms.
UL: maximum 99.990%-ile delta delay over all 6 reflectors: 68.630 ms.
UL: maximum 99.999%-ile delta delay over all 6 reflectors: 68.630 ms.
INFO: Writing plot as: ./output.timecourse.pdf
INFO: fn_parse_autorate_log took: 81.6908 seconds.

Is the issue that I can't safely cruise along at 20Mbit/s base rate because Teams can actually generate quite bursty data?

The data looks fairly unremarkable right? The bit in the middle is the call start. But the issue was just general video stutter throughout the call. Audio seemed fine and no complaints about my upload of audio/video (although I didn't ask).

Here is a zoom on the data after the call:

I'm guessing cake rate is too high but the loading is not sufficient to generate discernable bufferbloat in the pings, albeit temporary bursts of data can still result in bufferbloat? But then wouldn't I see that in the pings? How does that work?

My limited understanding is that it effectively merges/combines acks when possible, so that fewer have to be sent. I'm probably oversimplifying it though.

Well, it has two consequences:
a) there is the potential for a little less ACK traffic. For MTU ~1500 TCP Reno will cause ~1/40 of the forward traffic as reverse ACK traffic (for wildly asymmetric links that thinning of the ACK stream that the ACK filter produces can relieve some pressure)
b) the resulting ACK stream at the receiving end will be smoother which results in better overal TCP performance.

Whether that helps in a specific situation is harder to predict than to test :wink:

This is a pretty good description!

I just checked and I have:

cake_ul_options="diffserv4 dual-srchost nonat wash no-ack-filter noatm overhead 0"
cake_dl_options="diffserv4 dual-dsthost nonat nowash ingress no-ack-filter noatm overhead 0"

I'm very curious about your thoughts on my Teams call stutter given my post above.

I'm trying to get a handle on situation where cake rate may be too high, giving stutter on bursty live video feed data, but yet not manifest in terms of pings evenly spread across in time. How is the connection capacity in terms of handling video data assessed relative to ICMPs? What I'm trying to wrap my head around is why ICMPs look fine, but cake rate was probably too high.

Could this be related:

It seems setting CleanBrowsing family filter DNS results in the Microsoft Exchange front door resulting in Berlin. Can I direct IP lookups to certain domains to a specific DNS (presumably my ISPs DNS would be better for the microsoft domain lookups). This may or may not be a red herring.

The DNS lookups flip flop between London and Hamburg/Berlin:

nslookup  outlook.office365.com
Server:  OpenWrt-1.lan
Address:  192.168.1.1

Non-authoritative answer:
Name:    LHR-efz.ms-acdc.office.com
Addresses:  2603:1026:c06:1400::2
          2603:1026:c06:1401::2
          2603:1026:c06:2b::2
          2603:1026:c06:6d::2
          52.97.146.162
          52.97.211.66
          52.98.207.146
          52.97.211.82
Aliases:  outlook.office365.com
          outlook.ha.office365.com
          outlook.ms-acdc.office.com


nslookup  outlook.office365.com
Server:  OpenWrt-1.lan
Address:  192.168.1.1

Non-authoritative answer:
Name:    SXF-efz.ms-acdc.office.com
Addresses:  2603:1026:c0e:865::2
          2603:1026:c0e:870::2
          2603:1026:c0e:2a::2
          40.99.148.82
          52.98.229.146
          52.98.240.114
Aliases:  outlook.office365.com
          outlook.ha.office365.com
          outlook.ms-acdc.office.com

Or perhaps the issue relates to packet loss? So many variables.

Your ISP might treat ICMP probes differently?

I agree the autorate timecourse looks pretty innocent, nothing in there to predict stutter in the video... How bursty teams traffic is I do not know, but that is something you should be able to look at in a packet capture, no?

Yes, that looks puzzling... having your "front door" in Berlin certainly does not help, but video conferencing should work across and between continents, so just "over to the continent" should not cause big issues :wink:

Or the video sender had a shitty path and the feed came in already choppy... occasionally it is somebody else's ISP that is at fault :wink:

I honestly do not know, ever since I switched to my turris omnia under turrisOS, I opted to operate DNS as non-forwarding resolver (using DNSSEC) and never looked back to other DNS approaches. (Well, my ISP only resolves its own SIP-servers via its own DNS servers, so I configured my VoIP-base station to use O2's DNS servers and and disabled the "force local DNS" in my adblocker, which with the advent of dot and doh lost part of its teeth anyways, but I digress).

Well packet loss certainly can cause choppy video, but again this does not need to happen at your end (it is likely though that a variable-rate link like yours is part of the issue).

Thanks for your thoughts. You posted here:

with instructions to use mtr. I tried mtr from my downstream router to the Microsoft Teams server (I don't have credentials for my upstream NR7101 handy), but I just get:

root@OpenWrt-1:~# mtr -ezb4w -c 100 52.112.139.57
Start: 2023-04-10T20:22:39+0100
HOST: OpenWrt-1          Loss%   Snt   Last   Avg  Best  Wrst StDev

any idea why? Is it that server doesn't respond to these types of packets?

I found one that seemed associated with Teams that does respond:

root@OpenWrt-1:~# mtr -ezb4w -c 100 52.97.211.242
Start: 2023-04-10T20:30:13+0100
HOST: OpenWrt-1                              Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS8075   52.97.211.242 (52.97.211.242)  17.0%   100   37.8  48.6  37.8  60.7   4.3

17% seems crazy high?

And even worse to Cloudflare:

root@OpenWrt-1:~# mtr -ezb4w -c 100 1.1.1.1
Start: 2023-04-10T20:33:36+0100
HOST: OpenWrt-1                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS13335  one.one.one.one (1.1.1.1)  25.0%   100   41.8  41.4  32.6  51.3   4.6

1 in 4 packets lost doesn't seem good at all. Maybe this is a glitch that a modem reset would resolve.

123-1234567:flent smoeller$ sudo mtr -ezb4w -c 100  52.112.139.57
Start: 2023-04-10T21:31:31+0200
HOST: 123-1234567.local                                                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    192.168.42.1                                                        0.0%   100    0.9   0.8   0.4   1.1   0.1
  2. AS6805   loopback1.0001.acln.06.ham.de.net.telefonica.de (62.52.201.198)     0.0%   100   10.4  20.1  10.4 135.8  20.5
  3. AS6805   bundle-ether8.0002.dbrx.06.ham.de.net.telefonica.de (62.53.1.234)   0.0%   100   12.5  12.0  10.3  13.7   0.7
  4. AS6805   ae7-0.0001.corx.06.ham.de.net.telefonica.de (62.53.15.0)            0.0%   100   19.4  20.4  18.0  45.4   4.1
       [MPLS: Lbl 16736 TC 0 S u TTL 1]
  5. AS6805   ae6-0.0002.corx.02.fra.de.net.telefonica.de (62.53.0.49)            0.0%   100   18.1  20.8  17.4  48.6   5.8
       [MPLS: Lbl 16736 TC 0 S u TTL 1]
  6. AS6805   bundle-ether2.0002.dbrx.01.off.de.net.telefonica.de (62.53.0.209)   0.0%   100   19.7  19.4  17.8  24.4   0.7
       [MPLS: Lbl 16736 TC 0 S u TTL 1]
  7. AS6805   bundle-ether2.0002.prrx.09.fra.de.net.telefonica.de (62.53.3.191)   0.0%   100   19.5  20.0  18.6  30.6   1.3
  8. AS???    ???                                                                100.0   100    0.0   0.0   0.0   0.0   0.0
  9. AS8075   ae29-0.icr02.fra21.ntwk.msn.net (104.44.235.194)                    0.0%   100   26.3  25.3  21.5  51.8   5.0
 10. AS8075   be-122-0.ibr02.fra21.ntwk.msn.net (104.44.23.109)                   0.0%   100   40.8  42.0  39.4 133.5  10.0
       [MPLS: Lbl 24160 TC 0 S u TTL 1]
       [MPLS: Lbl 22476 TC 0 S u TTL 1]
 11. AS8075   be-13-0.ibr02.ber20.ntwk.msn.net (104.44.29.185)                    1.0%   100   41.1  42.2  39.3 123.5   9.1
       [MPLS: Lbl 69087 TC 0 S u TTL 1]
       [MPLS: Lbl 22476 TC 0 S u TTL 2]
 12. AS8075   be-14-0.ibr02.mma01.ntwk.msn.net (104.44.30.116)                    0.0%   100   41.1  41.7  39.3  83.0   5.0
       [MPLS: Lbl 69087 TC 0 S u TTL 1]
       [MPLS: Lbl 22476 TC 0 S u TTL 3]
 13. AS8075   be-7-0.ibr02.sto30.ntwk.msn.net (104.44.19.222)                     0.0%   100   46.0  46.0  44.1  64.2   2.2
       [MPLS: Lbl 69087 TC 0 S u TTL 1]
       [MPLS: Lbl 22476 TC 0 S u TTL 4]
 14. AS8075   be-6-0.ibr02.gvx01.ntwk.msn.net (104.44.19.221)                     0.0%   100   39.0  39.2  37.1  97.5   5.9
       [MPLS: Lbl 22476 TC 0 S u TTL 1]
 15. AS8075   ae120-0.rwa01.gvx01.ntwk.msn.net (104.44.23.30)                     0.0%   100   38.7  40.1  36.6  57.2   3.9
 16. AS???    ???                                                                100.0   100    0.0   0.0   0.0   0.0   0.0

So i am getting close, but no response from the end-host. However I have no SLA contract with teams that requires them to respond to pings, so not sure I have standing to complain :wink:

Yes, as I joke above responding to ICMP echo requests is mostly voluntary and something busy servers often are configured not to do (or there is a firewall that filters the requests out before they hit the server); this also seems true for at least some on-line game servers.

Not for intermediary hops, these owe us nothing and often use rate-limiting and de-prioritization so ICMP handling does not interfere with their primary job of forwarding/routing data packets.

High loss rate to the finsal host never is good!

Ah good test is to run the irtt test for say 10-30 minutes on an otherwise quiescent network and have a look at the observed packet osses per direction...

I'll try that - I run irtt as per:

irtt client -i10ms -d5m de.starlink.taht.net -o irtt_LTE_$(date +%Y%m%d_%H%M%S) > irtt_LTE_$(date +%Y%m%d_%H%M%S)_debug.out

when network is quiet for ten minutes? With cake + cake-autorate running?

Stupid question but if I get a VPS can I set a tunnel with that that multiplexes every packet so it is sent ten times and I sacrifice bandwidth to ensure I get next to no packet loss? Or is that rendered moot by existing protocols - use of 'ack', etc. Could my 'ack' filtering as per:

cake_ul_options="diffserv4 dual-srchost nonat wash no-ack-filter noatm overhead 0"
cake_dl_options="diffserv4 dual-dsthost nonat nowash ingress no-ack-filter noatm overhead 0"

actually be hurting? Update - oh no, I see 'no-ack-filter'. Doh, my mind is shattered. Probably this massive restructuring of cake-autorate (897 additions and 837 deletions) is a factor.

I think this says "5 minutes" test duration...

if the network is truly quiescent it should not matter that cake is running, but since this is easy to test, try both with and without cake active?

That idea generically is called Forward Error correction (short FEC), maybe something like:

would work?

No not really TCP does not tolerate random losses all that well, so any method to avoid such losses can help. But it does not come for free, FEC typically increases the size of the data (since it needs to add redundancies) and often also introduces additional delay (e.g. if interleaving is used).

You are not enabling cake's ACK filter at all ;), so I believe the answer about hurt is "no".

Could well be, large changes tend to bring in their own stabilization periods...

Does setting a fixed cake bandwidth at 5Mbit/s in both directions reduce bufferbloat for you as compared to having no cake set at all (as measurable using e.g. https://www.waveform.com/tools/bufferbloat)?

@Lynx This would be my waveform test at 5mbps https://www.waveform.com/tools/bufferbloat?test-id=2671bc11-b896-427e-9b0c-a547597e39a3

This is it without any "shaping"

The results remain inconsistent ... but you can definitely feel a good difference with and without cake-autorate as i tend to not get massive amount of jitter and packet loss.


config queue 'eth1'
        option interface 'eth1'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option debug_logging '0'
        option verbosity '5'
        option qdisc_advanced '1'
        option squash_dscp '1'
        option squash_ingress '1'
        option qdisc_really_really_advanced '1'
        option linklayer 'ethernet'
        option linklayer_advanced '1'
        option tcMTU '2047'
        option tcTSIZE '128'
        option linklayer_adaptation_mechanism 'cake'
        option ingress_ecn 'NOECN'
        option egress_ecn 'NOECN'
        option tcMPU '96'
        option overhead '58'
        option enabled '1'
        option download '1200000'
        option upload '1200000'
        option iqdisc_opts 'nat diffserv4 dual-dsthost ack-filter'
        option eqdisc_opts 'nat diffserv4 dual-srchost ack-filter'

These are my cake autorate settings as of now

adjust_dl_shaper_rate=1 # enable (1) or disable (0) actually changing the dl shaper rate
adjust_ul_shaper_rate=1 # enable (1) or disable (0) actually changing the ul shaper rate

min_dl_shaper_rate_kbps=5000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=525000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=1200000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=5000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=525000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=1200000  # maximum bandwidth for upload (Kbit/s)

I'll defer to @moeller0 here as he's more familiar with your connection type (could you elaborate on that?) and how to assess it and set cake up properly on it. Are you sure the connection capacity varies with time and it's not just a question of working out a fixed cake bandwidth to set? Also even the result you post without cake doesn't look so bad. But then my standards are pretty low since I am familiar with 4G - circa 40ms to 100ms RTT - and keeping RTT under 80ms is the goal! What sort of hardware router do you have I wonder because shaping at this level is computationally expensive.

Even if @moeller0 thinks cake-autorate isn't suited to your use case, could you post some data showing a saturating download and then upload (even if just using a speed test like the waveform cycle)?

almost all of japans endpoints are using cellular like links so the latency can be up to 1000ms for an extended period of time, and during those times they will make everyones connection in my area near 128kbps... which even if you complain to their FCC they will just say too bad so sad @_@

I have almost the exact problems moeller has but im going to assume the UK has things handled a bit better

I will attempt to provide a saturated link, I will post the results asap.

1 Like