SQM impact on browsing performance and ACK/small packets delay

TalalMash · February 19, 2024, 9:45pm

When using image loading tests such as http://www.http2demo.io/ or https://github.com/TalalMash/100-Image-Load-Test few things like creating a TCP connection is not as responsive as without.

Tested with OpenWrt 23.05.2 and an x86 router (Ryzen 5800) with 100Mbit fiber and a far hosted server at 60ms as well as other sites.

This is also significantly noticeable with DNS queries, git, google maps, and loading large sites although not a deal breaker.
QUIC and single stream TCP such as HTTP2.0 is not impacted.

Results

Test	Results
SQM cake + layer_cake vs NONE (60ms 100Mbit fiber - Chrome)	1.3 - 0.3 seconds
SQM cake + layer_cake vs NONE (60ms 100Mbit fiber - Safari)	3 - 0.6 seconds
SQM cake + layer_cake vs NONE (60ms 100Mbit fiber - Firefox)	1.5 - 1.5 seconds
60ms 100Mbit + Safari vs Chrome vs Firefox	0.5 - 0.4 - 1.5 seconds
Localhost test + Safari vs Chrome vs Firefox	0.2 - 0.3 - 0.1 seconds

Safari on macOS, Chrome and Firefox on Windows.
It's interesting to see how browsers wildly varied, with Safari having the largest difference.

SQM is setup without any extras with cake + layer_cake, changing the limiter doesn't have any impact including 0.
Is this an expected side effect and any areas to improve?

Lynx · February 19, 2024, 10:25pm

Weird. Hopefully @moeller0 has an idea or two about what’s going on here.

Is this with an otherwise unloaded network?

I don’t see any difference when enabling or disabling cake (using my own cake-qos-simple implementation - but surely this isn’t relevant) when viewing that test site through Safari on my iPhone. Perhaps wired might show a difference.

moeller0 · February 20, 2024, 6:13am

This seems to be the to be expected result, as the test result is just dominated by the total average transfer throughput.... and AQMs will tend to slightly reduce the average throughput somewhat...

Especially with the small images there is not enough data in flight to noticeably self congest so the AQM is not helping by keeping the queue reasonably small.
What results do you get if you:
a) switch to high resolution images?
b) run a concurrent load test of sufficient duration?

But to fully understand what is going on here I would propose getting and comparing packet captures for the different tests....

Lynx · February 20, 2024, 7:32am

Do you mean that the difference can be explained by the slight loss of bandwidth associated with avoiding bufferbloat? So say for 100Mbit/s connection then max bandwidth with bufferbloat may be 115Mbit/s and we lose some of that when we throttle. But surely that wouldn’t be enough to account for such large differences in the total times as reported?

moeller0 · February 20, 2024, 8:06am

Well, the tests seem to daisy chain loading of 100 images, so each new image is only loaded if the previous was fully transferred. If we drop a packet (to indicate 'slow down a bit, will you') that packet will need to be retransmitted before we can load the next...

With the small images each will fit into a packet already, so dropping a packet means we are missing a full image and we likely have to wait for a retransmit timeout (as there are not necessarily more packets queued releasing an ACK).

Personally I would try to see whether I can get ECN signalling working for the test... (cake supports that out of the box, as do most Linux servers, so you likely only need to configure it on the end-point).

This is pretty much tailored to be a worst case scenario for AQMs, as the total run time here is shortest if utilisation is highest and AQM's cost a bit of utilisation... the test also does not measure the transient bufferbloat caused by not having an AQM... hence my questions above how this performs with additional loading of the link...

Lynx · February 20, 2024, 8:11am

Dumb question time again, but how do we know this relates to AQM? I thought that was for a specific type of connection?

moeller0 · February 20, 2024, 8:21am

I might have misunderstood the OP, but to me this looks like completion time with SQM is noticeably higher than without... Now I would like to see:
0) disable SQM /etc/init.d/sqm stop

a screenshot of a speedtest from https://speed.cloudflare.com WITHOUT sqm
enable sqm /etc/init.d/sqm start
cake statistics: tc -s qdisc
a screenshot of a speedtest from https://speed.cloudflare.com WITH sqm
cake statistics: tc -s qdisc

The goal here is to see the performance of the link with and without your sqm configuration...

Then

disable SQM /etc/init.d/sqm stop
enable SQM /etc/init.d/sqm start (this will rest the statistics)
cake statistics: tc -s qdisc
run your test
cake statistics: tc -s qdisc

The ide is to see how many packets get dropped during the test...

Lynx · February 20, 2024, 8:25am

Semes helpful and the results from those tests will surely be informative, and I’m not wanting to hijack the thread, but I meant to ask about the technology: AQM. Does this potentially apply here because the OP is using fibre?

moeller0 · February 20, 2024, 8:35am

Nah, at the core cake and fq_codel employ a variant of CoDel which itself isa an active queue management (AQM) method. The link technology has nothing to do with it, it is just that his tests marked SQM cake will employ an AQM and his tests without likely will just employ his ISP's FIFO (or unmanaged buffers). But note, that without more information from the OP all I can do is guess (or maybe get my own packet captures to night, assuming I am bored enough).

Arie · February 20, 2024, 10:57am

Getting no difference between layer cake and no SQM on the http2demo.io site.

Very consistent 0.38-0.42s with both.

Tested in Firefox on 1000Mbit fiber with a x86 router (N305)

moeller0 · February 20, 2024, 11:18am

Potentially caused by the difference in shaper rate (100 versus 1000) and potentially the RTT to the server.

Lynx · February 20, 2024, 6:50pm

Here is what I see with my variable rate 4G circa 5-80Mbit/s connection (with cell tower heavily saturated at worst possible conditions) and circa 50ms to host:

root@OpenWrt-1:~# ping http://www.http2demo.io/
ping: http://www.http2demo.io/: Name does not resolve
root@OpenWrt-1:~# ping www.http2demo.io
PING 1906714720.rsc.cdn77.org (195.181.164.14) 56(84) bytes of data.
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=1 ttl=54 time=46.7 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=2 ttl=54 time=45.6 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=3 ttl=54 time=54.6 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=4 ttl=54 time=52.9 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=5 ttl=54 time=51.0 ms
^C
--- 1906714720.rsc.cdn77.org ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 45.639/50.182/54.607/3.470 ms

cake disabled:

With cake (and cake-autorate):

cake set to 5Mbit/s download and 5Mbit/s upload - first run:

cake set to 5MBit/s download and 5MBit/s upload - second run:

Without cake:

Basically with running many times I don't see a significant difference unless I seriously throttle with cake down to around 5Mbit/s and there is other traffic on the network.

moeller0 · February 20, 2024, 8:53pm

You might want to try chromes waterfall view:

to see where it spends its time...

For me I mostly see what I expect, namely this hole thing being serialized by the sequential loading of ~200 images

TalalMash · February 22, 2024, 1:16am

qdisc cake 8029: dev pppoe-PortTwo root refcnt 2 bandwidth 20Mbit diffserv4 dual-srchost nat nowash ingress ack-filter split-gso rtt 100ms noatm overhead 28 
 Sent 32039 bytes 144 pkt (dropped 0, overlimits 44 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 17152b of 4Mb
 capacity estimate: 20Mbit
 min/max network layer size:           29 /    1450
 min/max overhead-adjusted size:       57 /    1478
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       1250Kbit       20Mbit       10Mbit        5Mbit
  target         14.6ms          5ms          5ms          5ms
  interval        110ms        100ms        100ms        100ms
  pk_delay          0us        805us          0us          0us
  av_delay          0us         29us          0us          0us
  sp_delay          0us          3us          0us          0us
  backlog            0b           0b           0b           0b
  pkts                0          144            0            0
  bytes               0        32039            0            0
  way_inds            0            0            0            0
  way_miss            0           42            0            0
  way_cols            0            0            0            0
  drops               0            0            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            0            0
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         2835            0            0
  quantum           300          610          305          300

qdisc ingress ffff: dev pppoe-PortTwo parent ffff:fff1 ---------------- 
 Sent 35471 bytes 145 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 802a: dev ifb4pppoe-PortT root refcnt 2 bandwidth 107Mbit diffserv4 dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 28 
 Sent 35471 bytes 145 pkt (dropped 0, overlimits 31 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 17152b of 5350000b
 capacity estimate: 107Mbit
 min/max network layer size:           40 /    1410
 min/max overhead-adjusted size:       68 /    1438
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       6687Kbit      107Mbit    53500Kbit    26750Kbit
  target            5ms          5ms          5ms          5ms
  interval        100ms        100ms        100ms        100ms
  pk_delay          0us        142us          0us          0us
  av_delay          0us          8us          0us          0us
  sp_delay          0us          4us          0us          0us
  backlog            0b           0b           0b           0b
  pkts                0          145            0            0
  bytes               0        35471            0            0
  way_inds            0            0            0            0
  way_miss            0           41            0            0
  way_cols            0            0            0            0
  drops               0            0            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            0            0
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         1410            0            0
  quantum           300         1514         1514          816

qdisc cake 8029: dev pppoe-PortTwo root refcnt 2 bandwidth 20Mbit diffserv4 dual-srchost nat nowash ingress ack-filter split-gso rtt 100ms noatm overhead 28 
 Sent 53514647 bytes 95946 pkt (dropped 38649, overlimits 150548 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 268352b of 4Mb
 capacity estimate: 20Mbit
 min/max network layer size:           29 /    1452
 min/max overhead-adjusted size:       57 /    1480
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       1250Kbit       20Mbit       10Mbit        5Mbit
  target         14.6ms          5ms          5ms          5ms
  interval        110ms        100ms        100ms        100ms
  pk_delay          0us         45us          0us          0us
  av_delay          0us         19us          0us          0us
  sp_delay          0us          2us          0us          0us
  backlog            0b           0b           0b           0b
  pkts                0       134594            0            1
  bytes               0     56362158            0           77
  way_inds            0           18            0            0
  way_miss            0          166            0            1
  way_cols            0            0            0            0
  drops               0            8            0            0
  marks               0            0            0            0
  ack_drop            0        38641            0            0
  sp_flows            0            1            0            1
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0        47260            0           77
  quantum           300          610          305          300

qdisc ingress ffff: dev pppoe-PortTwo parent ffff:fff1 ---------------- 
 Sent 191102193 bytes 152809 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 802a: dev ifb4pppoe-PortT root refcnt 2 bandwidth 107Mbit diffserv4 dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 28 
 Sent 187169703 bytes 150020 pkt (dropped 2789, overlimits 255776 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 5226Kb of 5350000b
 capacity estimate: 107Mbit
 min/max network layer size:           34 /    1432
 min/max overhead-adjusted size:       62 /    1460
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       6687Kbit      107Mbit    53500Kbit    26750Kbit
  target            5ms          5ms          5ms          5ms
  interval        100ms        100ms        100ms        100ms
  pk_delay          0us       5.62ms          0us          0us
  av_delay          0us        5.4ms          0us          0us
  sp_delay          0us          4us          0us          0us
  backlog            0b           0b           0b           0b
  pkts                0       152809            0            0
  bytes               0    191102193            0            0
  way_inds            0           18            0            0
  way_miss            0          159            0            0
  way_cols            0            0            0            0
  drops               0         2789            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            0            0
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         1432            0            0
  quantum           300         1514         1514          816

200 image load test

Reset then Run

qdisc cake 802d: dev pppoe-PortTwo root refcnt 2 bandwidth 20Mbit diffserv4 dual-srchost nat nowash ingress ack-filter split-gso rtt 100ms noatm overhead 28 
 Sent 122361 bytes 828 pkt (dropped 0, overlimits 234 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 12864b of 4Mb
 capacity estimate: 20Mbit
 min/max network layer size:           29 /    1183
 min/max overhead-adjusted size:       57 /    1211
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       1250Kbit       20Mbit       10Mbit        5Mbit
  target         14.6ms          5ms          5ms          5ms
  interval        110ms        100ms        100ms        100ms
  pk_delay          0us         87us          0us          0us
  av_delay          0us          7us          0us          0us
  sp_delay          0us          2us          0us          0us
  backlog            0b           0b           0b           0b
  pkts                0          828            0            0
  bytes               0       122361            0            0
  way_inds            0            0            0            0
  way_miss            0           70            0            0
  way_cols            0            0            0            0
  drops               0            0            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            0            0
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         1183            0            0
  quantum           300          610          305          300

qdisc ingress ffff: dev pppoe-PortTwo parent ffff:fff1 ---------------- 
 Sent 1401909 bytes 1264 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 802e: dev ifb4pppoe-PortT root refcnt 2 bandwidth 107Mbit diffserv4 dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 28 
 Sent 1401909 bytes 1264 pkt (dropped 0, overlimits 1001 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 47168b of 5350000b
 capacity estimate: 107Mbit
 min/max network layer size:           40 /    1460
 min/max overhead-adjusted size:       68 /    1488
 average network hdr offset:            0

                   Bulk  Best Effort        Video        Voice
  thresh       6687Kbit      107Mbit    53500Kbit    26750Kbit
  target            5ms          5ms          5ms          5ms
  interval        100ms        100ms        100ms        100ms
  pk_delay          0us        141us          0us          0us
  av_delay          0us         37us          0us          0us
  sp_delay          0us          5us          0us          0us
  backlog            0b           0b           0b           0b
  pkts                0         1264            0            0
  bytes               0      1401909            0            0
  way_inds            0            0            0            0
  way_miss            0           66            0            0
  way_cols            0            0            0            0
  drops               0            0            0            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            0            1            0            0
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len             0         1460            0            0
  quantum           300         1514         1514          816

Notes:

ISP boosts speeds for up/down for 3 seconds on any new connections.
Extra:
Cloudflare test with cake enabled with no speed limit ingress/egress at 0:

speed.cloudflare.com_ (2)1028×1771 67 KB

*You're correct that the speed limit or AQM is the issue, just had the same image loading result time with or without cake enabled with 0 ingress/egress

TalalMash · February 22, 2024, 1:29am

With cake enabled, the performance was significantly better with saturated load compared to without.

I just tested with 4G and noticed similar results with yours, but 4G is never as smooth as fiber, and was one second higher than fiber with exact latency with SQM or without when testing 64kb 100-image.
Fiber vs 4G at 0.15ms interval:

(loss is from switching networks)

One of the reasons why setting my fiber at 20Mbit is still snappier at browsing compared to my LTE at ~60Mbit.

I guess AQM/SQM isn't an issue for anyone living in a country with many CDNs, since the latency is low enough that it has negligible impact.
I'm still using it for per-host isolation fairness.

TalalMash · February 22, 2024, 1:51am

Waterfall HAR file: https://filebin.net/64dds75ckr7d5wvl

Disabling/Enabling ECN on the server and the client had no difference, including saturating the line with SQM or without.

Lynx · February 22, 2024, 7:25am

How about LTE with cake at 20Mbit/s?

moeller0 · February 22, 2024, 8:00am

I think that is part of the issue here... From the har file I see the test transfers 13.2 MiB of data

with your shaper setting you can maximally expect the following shaped goodput (PPPoE, IPv4, TCP, no TCP options)
107 * ((1500 - 8 - 20 -20) / (1500-8+28)) = 102.2 Mbps (but you only measure 88.9, so something is odd here)
the unshaped goodput sits at 116 Mbps

CAKE nominal: (8 * 13.2 * 1024^2) / (102 * 1000^2) = 1.08558456471 seconds
CAKE achieved: (8 * 13.2 * 1024^2) / (88.9 * 1000^2) = 1.24555259393 seconds
no CAKE: (8 * 13.2 * 1024^2) / (116 * 1000^2) = 0.954565737931 seconds

That is transfer time alone will be 13% - 30% slower with the traffic shaper
100 - 100 * 1.08558456471 / 0.954565737931 = -13.7%
100 - 100 * 1.24555259393 / 0.954565737931 = -30.5%

so that already explains part of the issue... but this also shows that the 0.3 seconds would require a boosted rate of:
(0.3 / (8 * 13.2 * 1024^2))^-1 / 1000^2 = 369 Mbps... and that is purely looking at the transfer rates, ignoring all accumulative delays from having to do multiple round trips...

That likely is the effect of latency, your test is quite sensitive to latency due to having these loading action in sequence where the next only starts after the first finished...

No that is not a generally true statement... AQM will only engage if your load saturates your link (only then a queue builds up that can be actively managed)...

No surprise there, the tc -s qdisc output of the test shows no packet drops or markings, so nothing ECN could improve on (it also invalidates my hypothesis that we might see effects of RTO retransmission timeout)

moeller0 · February 22, 2024, 8:19am

This IMHO answers this:

as I said the question is not about internet access capacity and latency, but really only 'is the link going to be congested often enough to care' or not.

TalalMash · February 22, 2024, 1:51pm

@moeller0 Thank you for the explanations!

There was around 5-10ms difference, but the tests showed difference in few seconds. (1.8 vs 6 seconds) with http2demo, but that's another topic I guess.

Seems like some browsers and protocols load multiple photos ahead of time, while others wait sequentially: https://imgur.com/a/b8P3XvB

(public HTTP/3 server for testing since every test online is HTTP/2: kiwithe.chickenkiller.com )