Bufferbloat, it's not just for WAN connections anymore

Now that 500Mbps+ fiber and DOCSIS connections are becoming more common, the issues with bufferbloat and uncontrolled latency are no longer just affecting people's internet connections. Now the slowest point in the link between your device and a server on the internet can easily be one of your LAN links.

For example, if you use a power line Ethernet, Moca, or G.hn device this can easily operate at 50-200 Mbps range, some wifi APs may have a 100Mbps Ethernet port, an old switch might be 100Mbps, or an old or damaged Ethernet cable might force a 100Mbps link, and wifi connections can easily modulate at anywhere from 6Mbps to 800. All of these will be slower than a 900+ Mbps fiber connection to your ISP.

If you experience latency issues it can be because of buffering in one of these choke points. What can you do?

  1. smart switch policers. If you have a link like a power line device connected to a smart switch, set the smart switch to limit the ingress and egress on this port to below the level reliably provided by the power line device. This will give a more consistent speed and latency than otherwise. My experience is maybe 20ms of delay under load is fairly achievable.

  2. judiciously use DSCP tags on your latency sensitive wifi traffic. For example video conference can be tagged to go through the video wmm queue. CS4,AF41,AF42 will normally go through the video queue. Even if there is buffering of bulk streams this may hold your conference calls latency down. For audio only connections you can try CS6 to send through the voice queue but limit this only to audio streams or games that use less than 1 Mbps preferably much less.

  3. Replace old switches and old Ethernet cables. For LAN use I can unreservedly recommend the TPlink sg108e smart switch. It's typically around $30-$35 and offers way improved utility over unmanaged switches, esp. 100Mbps ones.

  4. Segment your network. Move heavy traffic devices so the traffic doesn't go over the same wire and cause congestion within your network. For example don't daisy chain multiple desktop machines that all need to access a file server.

  5. Utilize Link Aggregation on smart switches to relieve congestion. For example file servers can utilize a pair of ports on a switch, or two switches can be connected via a pair of ports.

Do you have a gigabit internet connection and bufferbloat on your LAN? Leave comments below. I'm interested in how common this is now that gigabit fiber is no longer niche.

9 Likes

Yes, Jim Gettys has been on this since the beginning, on switches and wifi.

Also keep in mind that PLC adapter can be absolutely atrocious towards DSL (especially fast PLC standards versus faster DSL standards, like VDSL2 profile 35b), not only to your own, but also to your neighbors'. Sometimes PLC is the best available option, just keep the downsides in mind. E.g. an WiFi based mesh can often be an acceptable alternative. (People in pure Fiber/FTTH or DOCSIS/cable neighborhoods can ignore this obviously)

+1; on WiFi this is important as AC_VO can effectively starve AC_BE and even AC_VI can severely depress throughput in AC_BE, so the trick is to only elevate those packets to higher ACs that really profit from that higher priority. As you say:

+1; as I tried to spell out explicitly the art is to gain the advantage of AC_VO here without dragging in all the downsides.

Another thing I should have mentioned is if possible put all devices that need to send and receive large quantities of information between them on the same switch. Switches have a switching "fabric" where they can send data at full bandwidth between any two ports on the switch without it affecting the other ports. On the other hand, if two switches are linked by an ethernet cable, the total traffic across that ethernet cable can not be more than the speed of a single port. So for example if you have 3 desktop machines connected to a file server, put all those machines on the same switch if you can.

Another useful technique is to enable icmp and MLD snooping, and broadcast and multicast and unknown unicast storm protection on switches. Misbehaving devices can use up a bunch of bandwidth which will slow your network and also congest it making interactive uses near impossible. On the TP-Link sg108e for example you can set this up:

Which will prevent any port from utilizing more than 8Mbps of broadcast, multicast, or unicast (I assume combined). You can set the amount to whatever is reasonable for your network. For example if you have an IP camera network using multicast it might make sense to set this to 100Mbps or more. But if you don't utilize multicast then setting this as low as 1 megabit might make good sense.

You forgot the only real tip that has any meaningful point and it works. Buy gigabit equipment.

Just having Gigabit equipment isn't enough. Even with a gigabit connection between two switches it's easily possible for one device on one switch to talk to another device on the other switch and saturate the link between the two.

A,B -----> C -----> Router ----> WAN

If A,B are on the same switch and C is on another switch, and A talks to C at gigabit, then B will experience congestion talking to the WAN. If B is sending and receiving 2Mbps of video conference meeting while A is copying files to and from file server C, the zoom meeting on B will suck.

Or suppose you have D which talks to the WAN through the wifi access point at B... again video conference on D will suck.

Everything you just said is just a mind game because the real bottleneck is the max speed of the wan port.

What is the point of all this 500+ Mbit internet line if your wan port only handles 100Mbit?
You can have a 1trillion bit internet line but you will still get max 100Mbit in the entire network if the wan port isn’t more than that.

Sure, if your WAN port handles only 100Mbps that needs an upgrade right away. But that's not what I'm talking about, I'm talking about LAN devices that may handle less than a gigabit because they are either radio frequency links (WiFi) or Moca or powerline or they have a gigabit port but it's saturated by traffic.

Getting gigabit equipment is necessary but not sufficient to have good latency.

1 Like

A wan port is the same as any port, wan or lan is only the text on the plastic.
So what are you going to upgrade to if 100Mbit capacity on the wan port isn’t enough on a 500+ Mbit ISP connection?

You seem to be under the misimpression that I'm talking about a network where every device on the network is connected to the same single router at the edge and if you just get a gigabit all-in-one router your problems are solved because there is just one device to upgrade. That's not what I'm discussing here. I'm discussing people who may have something like this:

WAN ---- Router ----- Switch --- Switch ---- Access Point
           |
      Switch ---- Access Point
           |
      Powerline Device -----  Powerline Device ---- Access Point

Etc

1 Like

Impression, probably…
Usually we do actually talk about single home routers in this forum.

There are some here that have ER4 routers or equivalent that build home network that looks a little like your drawing.

But I highly doubt that many here have a home network like the one you drawn?

I actually think this is much more common these days, particularly for people with 500Mbps+ WAN connections. Three years ago, yes this was niche. But as very fast WAN has become more common, this kind of architecture is the main one that makes sense. Also with more and more wifi devices people want more and more access points. Mesh access point systems are extremely common even at the local electronics stores, and they wind up looking like:

Router  )))) Mesh Node
|         //      |||
Mesh node   )))) Mesh Node

Where all the links are wireless, and that's a special case of the general issue. I think this is very common and people are not used to having their internal network be the bottleneck, since when they had 60 or 100Mbps WAN they never had an internal bottleneck before.

2 Likes

In this country fiber to your home with pretty much unlimited Rx/Tx speed has been the name of the game for 15-20years, so welcome to the party.
You can’t even buy anything slower than Gbit equipment in the stores anyway.

And power line network is kind of old stuff.

Being lucky to be in an European country -like I was- does not mean what @dlakelan is presenting becomes the new normal for many other countries in the next year or two -looking at you AU. His point is a really good one and many people -like me- have 1Gbps and no way to use Moca or Ethernet to connect APs so, we are left with scenarios -WiFi mesh networks- which clearly suffer from a bottle neck in the LAN side. As this is happening more and more, might be the right time to start considering what are going to be our new solutions to this first world problem.

2 Likes

Yes. With 1GBit WAN a very large proportion of users will still rely on WiFi. And many of the latter will use WDS or mesh to extend their WiFi. And with WiFi the bottleneck is variable rate.

And for this and other such variable rate cases setting fixed bandwidth may well require comprising on available bandwidth.

And as @dlakelan points out above there are wired scenarios that present the same problem.

So @flygarn12 how about a little less poo-pooing this interesting and very real first world technical problem :smiley:.

Looking forward to constructive comments and ideas.

Ok for the wifi, can you do the same for the lan?

This applies to quite extreme setups rather than a "normal" home network?

Test setup:
Two Linksys MR8300 devices linked using one of the 5Ghz radios using WDS
Practical linkspeed is around 100-150mbit

Connected to "client" WDS:
Laptop (wireless (11n), downloading at "full speed"), Amazon tablet (11ac, streaming at ~14mbit/s using wireless), HTPC (wired and streaming at 14mbit/s). Wireless devices are all connected to the same radio.

Pinging the HTPC

    Packets: Sent = 50, Received = 50, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 2ms, Maximum = 29ms, Average = 12ms

Pinging the laptop (wireless)

    Packets: Sent = 50, Received = 50, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 5ms, Maximum = 37ms, Average = 16ms

Disconnecting the laptop improved latency to the HTPC box (still streaming and also the tablet)

Ping statistics for 192.168.1.241:
    Packets: Sent = 50, Received = 50, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 19ms, Average = 6ms

The remove device is behind a switch so the traffic flow would be:
Device - WDS Client - WDS Master - Network switch - Device

Conclusion, will maxing out available bandwidth increase latency? Yes
Is this a frequent issue for most people? Probably not
Does mixing 11n and 11ax devices affect performance in general? Yes
Will you still get latency spikes over wifi irregardless of "bufferbloat" being available or not? Most likely
Does it matter in the end? Probably not as most latency issues are related to how wifi works in general

Its really not that hard to get 100ms of latency on a WiFi link because a laptop that was connected at 300Mbps moves and changes modulation down to say 150Mbps. You can do this by running a dslreports test quite easily. It's also not hard to get a few hundred ms on a power line link running a similar test. These all despite having SQM on the WAN, because the bottleneck is not the WAN anymore.

Gigabit fiber is routine in many places now.

And also the point is surely that SQM is there and available to help, and it can be used on wwan just as effectively as on wan. Aside from the adaptive bandwidth element aspect the work has been done.

Or are there more problems to solve beyond that?

I think @amteza has shown that CAKE can be applied to sort out wwan bufferbloat on his wwan setup.

1 Like

Variable bandwidth is definitely an issue, you've encouraged a bunch of work on that, and it'll be beneficial for people with WWAN. But within modern home LANs there are all kinds of latency pitfalls. Consider something like this, based on some real life examples I know of:

The AP/Router handles the WAN which is 600/60Mbps DOCSIS. a NAS is coupled to a smart switch, and that coupled to a powerline adapter that feeds to the AP in the south east corner, an office PC and an IP camera in the north east corner, and an AP and HTPC in the north west corner, there's also a camera in the south central entranceway and in the north patio.

The desktop and HTPC may both need to hit the NAS for files and streaming videos, the cameras need to stream a few megabits continuously to be recorded to the NAS, and roaming around the house are phones, laptops, tablets, and gaming consoles for say 4 adults and 2 teenagers which roam between the available APs (or may be plugged into the AP switches or powerline adapters directly). The laptops may need to hit the NAS, and the phones, laptops, and tablets may be streaming multiple Netflix or YouTube streams.

This is by no means a "specialized" house layout, it's like a typical middle american house with work-from-home parents, guests, and teenage children.

Of course, running ethernet wires instead of PL equipment might well work a lot better, but we all know that people don't like wires strung around visibly, and also don't want to crawl under their house or in an attic.

With this setup it's easily possible to have latency issues say with the desktop (where a video conf may be going on) or with a roaming laptop on video conf due to congestion from NAS to desktop or NAS to HTPC, or two or three simultaneous video streamers, or someone downloading a file from the internet and saving it to the NAS etc.

Real world conditions being what they are, there are now plenty of opportunities for screwing up work-from-home video / audio conferencing due to any number of fast sources/sinks being bottlenecked by less than ideal conditions.

2 Likes

dslreports is a really bad way and not a very consistent way of testing, just using a different browser can yield quite different results. Yes, you can provoke results that are favourable but using DSL reports to test internal lan "latency" is just not to way to go as you're not isolating lan in any way. I'm going to say that when you hit higher latency it more boils down to the radio itself being overloaded / hitting its limitation.

1 Like