Bufferbloat, it's not just for WAN connections anymore

moeller0 · November 23, 2021, 7:53am

Well, testing is hard.... Point is, sure proper repeatable testing requires some level of care, but e.g. for WAN tests dslreports is a decent option especially for people that casually get into networking by trying to debloat their home network. And its results will be dominated by the bottleneck link, whether that is a LAN or a WAN link, so often also useful for internal debloating*. What alternative do you propose instead? (It seems easier to point out the existing shortcomings of a procedure, but often harder to come up with a better alternative, especially if the problem has multiple dimensions that not everybody weights equally).

That said, WiFi over-queueing is a known source of bufferbloat especially in leaf networks with increasing internet access rates, for some WiFi SoCs the make-wifi-fast folks have made a noticeable dent into wifi-bloat, and luckily all of the improvements either are already in OpenWrt or on the way.

Once WAN and WiFi are within acceptable latency under load range, wired infrastructure will come up next (and for some users even before WiFi), so what is the reason for the pushback In this thread?

*) IMHO the popular run iperf2/3 netperf on the router approach is at least as problematic as dslreports/waveform/fast.com plus ping....

Lynx · November 23, 2021, 8:09am

dtaht · November 24, 2021, 3:18am

@Lynx I very much enjoy your approach to bug reporting.

My own driving goal, for 30 years now, has been to make it possible for 4 musicians to collaborate in realtime with quality video and audio with a total latency of no more than 8ms, preferably 4, regardless of other traffic on the link.

I essentially achieved sub-2ms jitter and latency going p2p on sonic's fiber network in san francisco 5 years ago using cake, and having exhausted myself on the high speed networking part of the problem, and ensured that opus 48khz 2.7ms PLC would finally work correctly (back to back packet loss is a thing of the past in cake), took a look at the video codec problem and backed away screaming. We've put so much work into reducing video frame size as to make serious compromises as to what latencies could be achieved, and the best solution there is to go to a very high frame rate (240 fps), and to blast largely un-encoded packets and revert to good ole scan-lines - which will saturate a gbit network for a mere 4 users.

Building tools that can effectively move live video into packets at these rates and these low latencies is still hard, and I tend to boggle at the recent calls for the metaverse without much thought to somehow improving the internet AND local lans to have consistently low latency and jitter at high bandwidths.

Lynx · November 24, 2021, 9:39am

Thanks, well I definitely enjoy the exchanges on this forum. Some very bright people here. I am a huge fan of visuals, and I struggle to resist the temptation to post images here and there. I used to like programming and studied Engineering (focused in the end on fluid dynamics and thermodynamics), but actually ended up in the patent profession (medical devices - and mostly invalidating patents), but I think OpenWrt is rekindling my interest in programming. I used to love C / C++.

That is super cool to hear about your motivation and success story, and then the frustration with the video codec issue. Is low latency video something that has received much innovation of late I wonder? What about the technology in Teams and Zoom?

It does seem to be the subject of some patent applications:

I wonder if any of those are of interest.

See, e.g. Qualcomm's US 10,270,823 B2:

In general, this disclosure describes techniques for low-latency video streaming based on, e.g., media content formatted according to the ISO base media file format (ISOBMFF) and dynamic adaptive streaming over HTTP (DASH). DASH is described in, e.g., 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Transparent end-to-end packet switched streaming service (PSS); 3GPP file format (3GP) (Release 12) V12.2.0, December 2013. This disclosure describes various methods for defining and signalling of data which may conform to a new DASH profile (e.g., advanced live profile) and some new types of media segments that may enable low latency video streaming, including reduced channel acquisition and channel change times in broadcast and multicast, while potentially enabling high-efficiency video coding structures at the same time.

bobafetthotmail · November 24, 2021, 11:18am

Most of his network is stuff to bring a cable to an AP and other devices, so yeah everyone that does not live in a tiny house like me where I cover the house and the whole garden with a single AP sitting above the server rack may face similar problems.

You are really missing the point by a country mile here. The issue is not the equipment but the cables.
Can you easily rip out walls or install eth cables all over the building? Do you have the money to do it properly? Can you actually do it (you own the property or are renting?

In many countries it's not as easy as in the US where you can literally break "walls" with a strong kick because they are wooden frames with gypsium panels over them, nor the houses last so little like in the US where you need to tear down everything every 50 years.

In many countries the walls are made of brick or concrete and if your home's existing internal tubes for cables (if you even have that) are already filled with electrical wire there is no other way, either you put external cables (and go for the "industrial look") or use powerline. Nobody is breaking hundreds of meters of wall (and sometimes floors) to install a new cable tube for ethernet.

And for some puzzling reason, I see a lot of very new homes that still lack any kind of provisioning for ethernet, so again people is stuck just like those in older homes. That's one of the reasons why the Mesh wifi setups are so popular for consumers nowadays, you don't need to work on the walls to pull wire all over.

Because if you can just throw gigabit cables all over the place his point is indeed moot, you just rewire everything to reach a single large switch in the center of the house and boom, done. No more problems. But this is not easy in many cases as I said.

I mean, one of the main reasons 10Gbit isn't more popular is that it requires ripping out and replacing A LOT of existing cables that were installed years and years ago

Lynx · November 24, 2021, 11:25am

+1 and credit for using expression: 'by a country mile'. I totally don't get why people are so dismissive about use of WiFi extension means and blasé about installing cables. A ton of users will use WiFi extenders, WDS, mesh and the like because it is more practical.

Also the best WiFi rates I can manage with my WIDS setup are around 350 to 450 Mbit/s, but it varies between say 50 Mbit/s and that maximum depending on where I am located in the house, and how many concurrent WiFi connections, etc. Say if I am sitting on a bench in the garden with my laptop perhaps it fluctuates between 50 Mbit/s and 250 Mbit/s. And then I move to another bench or the outdoor table. Then it changes again.

flygarn12 · November 24, 2021, 12:00pm

Have only one question. Do you talk about US mile or Swedish mile or some other mile?

diizzy · November 24, 2021, 12:11pm

I don't understand how a test is decent if its inconsistent and doesn't really test what you're trying to prove?
There are quite a few claims here but very little to actually backup it up in terms of hard numbers. I'm sure there is some but to the extent where it matters / makes a practical difference?

diizzy · November 24, 2021, 12:15pm

What I think flygarn12 is getting at is that some of these theoretical setups are quite far fetched at least if you're going to target your average user. They most likely exist but more likely being a minority than a majority and does bufferbloat matter in those cases?

moeller0 · November 24, 2021, 12:54pm

Well, in my experience the dslreports speedtest for example is mostly self consistent when used with the same browser in similar conditions, and will for example show a clear increase in latency under load when I disable my traffic-shaper/AQM versus enabling that (shaped 100/36 versus unshaped 116/37). Since these increases are well reproducible I tend to trust that test, but I also do not put too much stock in its bufferbloat "grade" but always loot at the time resolved latency plots and compare those between different conditions. I also took care to properly configure that test in the first place.

Well my question is still open:

Sorry, I do not understand that question, could you maybe rephrase that?

I am sure I have no clue about the average user's set-up, so I do not mind threads like these that are tailored to some specific settings. At least over here PLC adapter are (unfortunately) quite popular and so the painted scenario does not sound outlandish to me. Sure, I would assume (not know) that many more users are bottlenecked by their WiFi link or WiFi mesh, but that still leaves the true bottleneck inside their home network. Sometimes these bottlenecks are properly debloated already (Openwrt with fq_codel in the WIFi stack and/or airtime fairness patches applied), some times these are stock proprietary solutions that range from competently done and bloat-free to painfully over-buffered and under-managed.

As I said in the second post, quite early on bufferbloat has been detected in switches and WiFi, which is not too surprising, as queueing will happen where ever there are speed transitions in a nework and unless these queues are properly sized and managed bufferbloat will show up if that link becomes the relevant bottleneck.
Whether bufferbloat matters or not is mostly a policy question each network admin needs to decide for herself, but I think it important to inform folks about bufferbloat and its consequences so these can be educated decisions.

Lynx · November 24, 2021, 1:00pm

@amteza can you post data from script run on your mesh setup and we can actually graph what happens? Maybe this might help? I think it might be helpful to show the effect of bufferbloat originating from a local bottleneck, and ideally also the effect of addressing that either with fixed bandwidth or variable bandwidth CAKE. Incidentally, LTE connections are set to increase to 1 Gbit/s too. Everyone wants more bandwidth!

Latency is only slowly being picked up upon. One of my friends has children that keep downloading stuff and gaming whilst he is trying to use Zoom. He imagines just getting more bandwidth will be the solution to the problem.

I think Joe Bloggs understands bandwidth, but not latency, and certainly not bufferbloat. This always makes me smile:

@moeller0 I am curious about the similarities and differences between bufferbloat originating say with multiple LTE towers serving simultaneous connections and multiple WiFi access points also serving simultaneous connections. Clearly in the home environment we have more control over what is going on.

moeller0 · November 24, 2021, 1:26pm

Mmmh, LTE uses a central controller instance (in the basestation I believe) that arbitrates/schedules transmissions to the remote user equipment, and that also grants these remote stations permission to send (so is aware and in control over who can transmit at what time), while WiFi in home environments is typically without a central scheduling instance but relies on listen-before-you send and collision detection. So at least in this dimension WiFi offers less control than LTE (from the LTE-operator's point of view, for end users there is not much one can change on an LTE link as far as I know). How these differences translate into the individual rate allotted to a specific station and how quickly that rate changes, I have no strong prediction. But this topic might be better discussed in another thread, since this is more about "wired" connections (including re-/ab-using existing power lines), no?

bobafetthotmail · November 24, 2021, 1:56pm

that's still wrong to assume that "average user" has a single router with wifi and that's all we should aim for imho. There are plenty of users that ask about setups with one or multiple APs or mesh in this forum.

In the pure consumer space I've seen a staggering amount of non-tech-savyy people I know buying proprietary Mesh wifi systems, just because they are very easy to set up (in most cases they have buttons to push for pairing or an app, or something like that) and work so_much_better than older gen wifi repeaters, both for wifi roaming (for devices) and bandwith/latency.

Powerline systems have not disappeared either, there are offerings from all major manufacturers and they are all faster than older gens so it's fair to assume they do sell enough to justify their existence and the R&D to improve the next gen

Is there a decent way to test this? Imho the first step in these things is to create a good test suite or test procedure.
Without that, this is all hearsay and idle chat

moeller0 · November 24, 2021, 2:14pm

Well, according to my experience tools like:

a properly configured dslreports speedtest
waveform's bufferbloat test
or a manual combination of fast.com (set to at least 60 seconds test time) and concurrent mtr
netperf/flent
iperf2 and concurrent mtr
iperf3 and concurrent mtr

will all work (the last three suffer from the issue that one needs to find proper servers on the internet that are publicly accessible and capable to saturate the bottleneck link). All of these obviously need some testing and control measurements to confirm that they allow to differentiate between known bloated and known unbloated conditions. None of this is "rocket science" including the validation, but it is also not a single click and forget kind of affair either.

Lynx · November 24, 2021, 3:07pm

If concerned about local bottleneck can't you just ping main router from local access point and observe increase associated with saturation of that local connection between main router and access point? So everything just local? I thought that is what @amteza did on his mesh with the script to vary CAKE bandwidth.

Does it need to be >1 WiFi access point / powerlan, etc.? I am still getting to grips with much of this but say you have a 1Gbit internet connection and just one router with a WiFi that can offer 400Mbit/s. If one client downloads a file whilst another client seeks to use Zoom, will the saturation of that 400Mbit/s link lead to bufferbloat? If not, why not? I must be missing something pretty fundamental because this seems to be taken as a given from the above. So I am keen to learn!

edwpat · November 24, 2021, 3:37pm

I read that an rj45 ethernet cable can also cause high latency Is this true ? because it slows down

flygarn12 · November 24, 2021, 3:43pm

Probably if you measure it enough many times you will get a result worse than you want.

But there are a lot of different cables also.
But almost all of them that you will find today is a lot faster than the fastest wifi.

And you can only run it for 150m.

moeller0 · November 24, 2021, 3:57pm

Yes, but you still need to saturate the bottleneck link. Depending on your networking chops and devices in the home network, one can do bufferbloat testing purely within one's own network, but the entry hurdles are IMHO higher than directing a browser towards dslreports' of waveform's test sites.

Yes, purely local is possible, but not necessarily trivial enough to have uncle Herbert or other layman relatives perform that on their own without guidance.

If both share that link, yes it can result in latency issues for the videoconference (but it does not need to). This is exactly where competent AQM can shine, in this case in the AP, but if our hypothetical bulk transfer goes in the other direction, so a 1Gbps upload, things become trickier, because now each client that might upload the data needs to be considerate about airtime access. (This is where careful prioritization of the VC data can help, as the higher wifi ACs have a better chance to acquire an tx-opportunity than the lower ones, the devil is however in the details).

For example if the AP is an recent OpenWrt device with either an ath9k, ath10 or mt76 it might already employ fq_codel in the wifi stack and that (in the case of ath10K together with AQL) can already help to make the bulk download not clobber the VC. Note that for uploads that is much harder as there is no real information about potential other uploads available to the stations (this is where LTE has an advantage, the central controller in the basestation? also assigns upload slot to the stations, so can also arbitrate uploads, plus LTE typically uses different frequencies for uploads and downloads unlike WiFi, where an overly aggressive upload can pummel concurrent downloads noticeably).

moeller0 · November 24, 2021, 4:00pm

Well, queueing delay can happen whenever there are transitions from fast to slow and that can happen with wires or wireless.

moeller0 · November 24, 2021, 4:05pm

Mmmh, I was under the impression that the limits for individual ethernet segment were <= 100m (depending on combination of speed and cable type). What am I missing?

I agree, AP manufacturers hide this fact often by advertising using the maximum gross rates a device can achieve (even though with wifi often the throughput is in the range of 70-50% of the gross rate) and to add insult to injury, they often also simply add up all the maximal rates of all radios, so a dual band device is advertised by the sum of its highest 2.5 GHz rate and the highest 5GHz rate, even though it is pretty unlikely that a single client will ever see this aggregate rate in reality.