Tested with OpenWrt 23.05.2 and an x86 router (Ryzen 5800) with 100Mbit fiber and a far hosted server at 60ms as well as other sites.
This is also significantly noticeable with DNS queries, git, google maps, and loading large sites although not a deal breaker.
QUIC and single stream TCP such as HTTP2.0 is not impacted.
Safari on macOS, Chrome and Firefox on Windows.
It's interesting to see how browsers wildly varied, with Safari having the largest difference.
SQM is setup without any extras with cake + layer_cake, changing the limiter doesn't have any impact including 0.
Is this an expected side effect and any areas to improve?
Weird. Hopefully @moeller0 has an idea or two about what’s going on here.
Is this with an otherwise unloaded network?
I don’t see any difference when enabling or disabling cake (using my own cake-qos-simple implementation - but surely this isn’t relevant) when viewing that test site through Safari on my iPhone. Perhaps wired might show a difference.
This seems to be the to be expected result, as the test result is just dominated by the total average transfer throughput.... and AQMs will tend to slightly reduce the average throughput somewhat...
Especially with the small images there is not enough data in flight to noticeably self congest so the AQM is not helping by keeping the queue reasonably small.
What results do you get if you:
a) switch to high resolution images?
b) run a concurrent load test of sufficient duration?
But to fully understand what is going on here I would propose getting and comparing packet captures for the different tests....
Do you mean that the difference can be explained by the slight loss of bandwidth associated with avoiding bufferbloat? So say for 100Mbit/s connection then max bandwidth with bufferbloat may be 115Mbit/s and we lose some of that when we throttle. But surely that wouldn’t be enough to account for such large differences in the total times as reported?
Well, the tests seem to daisy chain loading of 100 images, so each new image is only loaded if the previous was fully transferred. If we drop a packet (to indicate 'slow down a bit, will you') that packet will need to be retransmitted before we can load the next...
With the small images each will fit into a packet already, so dropping a packet means we are missing a full image and we likely have to wait for a retransmit timeout (as there are not necessarily more packets queued releasing an ACK).
Personally I would try to see whether I can get ECN signalling working for the test... (cake supports that out of the box, as do most Linux servers, so you likely only need to configure it on the end-point).
This is pretty much tailored to be a worst case scenario for AQMs, as the total run time here is shortest if utilisation is highest and AQM's cost a bit of utilisation... the test also does not measure the transient bufferbloat caused by not having an AQM... hence my questions above how this performs with additional loading of the link...
I might have misunderstood the OP, but to me this looks like completion time with SQM is noticeably higher than without... Now I would like to see:
0) disable SQM /etc/init.d/sqm stop
Semes helpful and the results from those tests will surely be informative, and I’m not wanting to hijack the thread, but I meant to ask about the technology: AQM. Does this potentially apply here because the OP is using fibre?
Nah, at the core cake and fq_codel employ a variant of CoDel which itself isa an active queue management (AQM) method. The link technology has nothing to do with it, it is just that his tests marked SQM cake will employ an AQM and his tests without likely will just employ his ISP's FIFO (or unmanaged buffers). But note, that without more information from the OP all I can do is guess (or maybe get my own packet captures to night, assuming I am bored enough).
Here is what I see with my variable rate 4G circa 5-80Mbit/s connection (with cell tower heavily saturated at worst possible conditions) and circa 50ms to host:
root@OpenWrt-1:~# ping http://www.http2demo.io/
ping: http://www.http2demo.io/: Name does not resolve
root@OpenWrt-1:~# ping www.http2demo.io
PING 1906714720.rsc.cdn77.org (195.181.164.14) 56(84) bytes of data.
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=1 ttl=54 time=46.7 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=2 ttl=54 time=45.6 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=3 ttl=54 time=54.6 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=4 ttl=54 time=52.9 ms
64 bytes from 263888592.lon.cdn77.com (195.181.164.14): icmp_seq=5 ttl=54 time=51.0 ms
^C
--- 1906714720.rsc.cdn77.org ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 45.639/50.182/54.607/3.470 ms
Basically with running many times I don't see a significant difference unless I seriously throttle with cake down to around 5Mbit/s and there is other traffic on the network.
*You're correct that the speed limit or AQM is the issue, just had the same image loading result time with or without cake enabled with 0 ingress/egress
With cake enabled, the performance was significantly better with saturated load compared to without.
I just tested with 4G and noticed similar results with yours, but 4G is never as smooth as fiber, and was one second higher than fiber with exact latency with SQM or without when testing 64kb 100-image.
Fiber vs 4G at 0.15ms interval:
(loss is from switching networks)
One of the reasons why setting my fiber at 20Mbit is still snappier at browsing compared to my LTE at ~60Mbit.
I guess AQM/SQM isn't an issue for anyone living in a country with many CDNs, since the latency is low enough that it has negligible impact.
I'm still using it for per-host isolation fairness.
I think that is part of the issue here... From the har file I see the test transfers 13.2 MiB of data
with your shaper setting you can maximally expect the following shaped goodput (PPPoE, IPv4, TCP, no TCP options)
107 * ((1500 - 8 - 20 -20) / (1500-8+28)) = 102.2 Mbps (but you only measure 88.9, so something is odd here)
the unshaped goodput sits at 116 Mbps
That is transfer time alone will be 13% - 30% slower with the traffic shaper
100 - 100 * 1.08558456471 / 0.954565737931 = -13.7%
100 - 100 * 1.24555259393 / 0.954565737931 = -30.5%
so that already explains part of the issue... but this also shows that the 0.3 seconds would require a boosted rate of:
(0.3 / (8 * 13.2 * 1024^2))^-1 / 1000^2 = 369 Mbps... and that is purely looking at the transfer rates, ignoring all accumulative delays from having to do multiple round trips...
That likely is the effect of latency, your test is quite sensitive to latency due to having these loading action in sequence where the next only starts after the first finished...
No that is not a generally true statement... AQM will only engage if your load saturates your link (only then a queue builds up that can be actively managed)...
No surprise there, the tc -s qdisc output of the test shows no packet drops or markings, so nothing ECN could improve on (it also invalidates my hypothesis that we might see effects of RTO retransmission timeout)
as I said the question is not about internet access capacity and latency, but really only 'is the link going to be congested often enough to care' or not.
There was around 5-10ms difference, but the tests showed difference in few seconds. (1.8 vs 6 seconds) with http2demo, but that's another topic I guess.
Seems like some browsers and protocols load multiple photos ahead of time, while others wait sequentially: https://imgur.com/a/b8P3XvB