Enabling SQM causes bufferfloat to go from B to C on a cable connection

Hi David,
according to http://www.intel.com/content/www/us/en/wireless-products/centrino-advanced-n-6205.html?wapkw=6205 the wifi card should manage up to 300Mbps (two streams at 40MHz, 2 streams at 20Mhz should still give >= 130 Mbps raw bandwidth). Now wifi is not terribly efficient, is half duplex in nature, but with his AP reporting MCS14 and MCS15 and his wifi card supporting those rates, 45Mbps seems still low (I thought the rule of thumb is wifi looses 50% not 100-100*45/130 = 65.38% :wink: ).
But unlike you I lack the real-world experience you have from designing, setting up and managing the wifi at SCALE, so I fully believe your description and would be quite thankful if you could elaboarte a bit more how much measurable good-put one can expect from typical wifi.

WOn Fri, 10 Mar 2017, moeller0 wrote:

... would be quite thankful if you could elaboarte a bit more how much
measurable good-put one can expect from typical wifi.

In this case, I didn't realize that he was using a AP and device that were
both supporting two streams, I thought he was using basic 802.11n (which I
believe only supports one stream), so if it should have supported 2, it was a
little low, as he saw when he switched to a different device, but not
unreasonable.

The problem with predicting good-put is that there are just so many variables.

Just looking at airtime useage at high data rates and you will see that the wifi
headers per timeslot overwelm the data transport.

IIRC, you have to transmit somewhere around 8K of packets to equal the airtime
used by the overhead, so transmitting 8K takes twice the airtime of transmitting
64 bytes

When you are doing a speed test, you are transmitting a lot of data in one
direction, and only acks in the other direction, each ack that gets transmitted
separately eats up a lot of airtime that could be used by data, so the ability
of the endpoint OS to thin out acks and batch them makes a huge difference. As
do details of the wifi drivers and buffering in the endpoint.

If you add bufferbloat monitoring to a raw speed test, you are sending
additional packets to measure the latency, and that can have a huge effect on
the number of transmit slots the endpoint needs to use.

Because of this, even a trickle of other data on the wifi network can have a
significant effect on the measured speed. If you have devices that are
generating any broadcast traffic (Alexa looking for devices to manage, DNLA
servers advertising themselves, many IoT things, etc), that can eat up valuable
transmission slots that are now not available for your speed tests[1].

One of the recent improvements in LEDE is the introduction of Airtime Fairness
into the ath10k drivers. This prevents any one station from hogging too much
airtime. This does wonders at limiting the damage that can be done by a single
small station[2]. But a side effect of this is that it will tend to put a lower
limit on the size of any single data transmission. I believe that I've seen that
some drivers would send up to 96K of data in a single transmission (up to 64k
1500 byte packets), and this will be trimmed down a bit, slightly lowering
good-put, but greatly improving latency.

An additional factor is that wifi tries really hard to play nice with other
stations. When a station is preparing to transmit, it checks to see if the
channel is clear or if something else is transmitting. As a result of this, even
a weak station that's transmitting on a channel you are using can eat an airtime
slot, even if you are right next to the AP and have an overwelmingly powerful
signal[3]

The delays that you can suffer on wifi will look like bufferbloat to soemthing
like dslreports. As a result, you should really do your bufferbloat tests
separately from wifi tests.

Because of all these things, wifi good-put can easily swing by far more than the
15% difference noted here without you having any idea why (unless you are doing
detailed captures of the RF environment at both the AP and the endpoint and then
analyze them carefully), unless you happen to be out in the boonies where you
don't have anyone else around

doing wifi speed tests is good for comparing different wifi configs (different
channels, different antennas, etc. But unless your connection is very bad, it's
not really going to tell you much about the bufferbloat in your wired devices.

I really wish I had a good way of predicing good-put, but mostly it boils down
to 'wifi is worse than you ever imagined', most of the numbers that people are
concerned about are not that unreasonable, they may be able to be improved, but
unless you do a RF analysis, or show that a different combination works much
better in that area, unless you are using multiple channels or -ac with multiple
streams, good-put in ballpark of 40Mb/s in a speed test isn't shabby

David Lang

[1] This is a really good argument for routing between 2.4G, 5G, and wired
subnets instead of bridging them. Most IoT things still tend to be 2.4G only

[2] prior to this, Linux would try to send the same amount of data to each
station, and if one is 1000x slower than the others, it wold get 1000x the
airtime. With these changes, they all get roughly equal airtime so the station
that's 1000x slower will get 1000x less data throughput instead. And in high-end
-ac environment, the difference between the fastest and the slowest station can
literally be 1000x, but even in 802.11n setups you can get to 100x

[3] This has the perverse effect that if you have better antennas on your AP,
your throughput will decrease because it is better at hearing unrelated stations
in your area. Sometimes just putting something metal between your AP and the
direction of an interfering station can greatly improve conditions for both
groups. Similarly, going to wider channels means that there are more things out
there that can clobber you, so unless you are in a fairly quiet RF enviornment,
and have only a small number of stations, you are probably better off using more
APs, each on a single channel, than using wider channels.

Hi David,

thank you very much for this detailed description. It sounds that it is actually a wonder that wifi works as well as it does, typicallly :wink:

Best Regards

thank you very much for this detailed description. It sounds that it is actually a wonder that wifi works as well as it does, typicallly :wink:

In many ways you are right. The basic design was put in place when it was
expected that wifi would be rather rare (APs >$1000, laptop adapters $700 each).
New generations of standards have tweaked things, but there has been the
requirement to be backwards compatible and to work in a completly unregulated
environment with zero coordination between different people's devices in the
same area, and the results are not conductive to maxing out the capabiltiies
efficently.

in high density environments it gets even uglier, because the assumption is that
if the transmission didn't get through, it must be due to a weak signal, not
interference, so the default is to slow down, which just makes it likely to
generate more interference, causing more stations to slow down (this is why
crowded wifi environments seem to work for a while and then performance falls
off a cliff), the make-wifi-fast project is doing wonders to improve this
behavior from the AP side

David Lang

Thanks for the enlightening discussion! It explains that I am getting only about 20 mbits now since there is an old IPhone slowing things down now. However, the weird thing is that normally (as in all the previous tests) I am the only one transmitting on my channel in the 5ghz band ( I scanned from my laptop and the AP and found only one other weak station using a channel very far away from mine).

I dug up this on the OpenWRT forum: https://forum.openwrt.org/viewtopic.php?pid=322894#p322894 - they say something about OpenWRT 15.0.1 not being able to handle congestion (unlike DDWrt) but I was not able to reproduce their problem so far.

For what it's worth, bufferbloat scores on my (wired) desktop are A+ while on my (wireless) laptop it doesn't get any 'better' than A. This is with an Intel 8260 AC card.

Thanks for the enlightening discussion! It explains that I am getting only
about 20 mbits now since there is an old IPhone slowing things down now.
However, the weird thing is that normally (as in all the previous tests) I am
the only one transmitting on my channel in the 5ghz band ( I scanned from my
laptop and the AP and found only one other weak station using a channel very
far away from mine).

in that case, you should be able to be getting much closer to the rated speed.

I dug up this on the OpenWRT forum:
https://forum.openwrt.org/viewtopic.php?pid=322894#p322894 - they say
something about OpenWRT 15.0.1 not being able to handle congestion (unlike
DDWrt) but I was not able to reproduce their problem so far.

15.0.1 is old enough to not be worth troubleshooting any longer :slight_smile: There are
lots of tunables, that could be different, and since openwrt, lede, and ddwrt
all run backported drivers, there is a very real chance of the drivers being
different

No, I am getting 20mbits when the IPhone is in the appartment. When I am alone here and only my laptop is connected, I get those 60 mbits I was reporting above. But at least now I know ot tell the IPhone user to connect to the 2,4 Ghz network so I am not slowed down.

Actually, after a while of having SQM turned on, I had to turn it off because it would prevent clients from connecting. It was a hit or miss, but it would happend several times a day that somebody would come home and their device would not get connected (and the router had to be restarted, which mostly helped, but only for a short while). So now it is turned off again:-(.

I fear your are not the only one having problems with traffic shaping on 860-L. Unfortunately it is not clear where exactly the problem lies, but it can be triggered by instantiating traffic shaping. It is not only sqm-scripts, but also qos-scripts that can trigger the router's susceptibility, not that helps you but see:
https://forum.openwrt.org/t/optimized-build-for-the-d-link-dir-860l/948
for more details...

Best Regards

Just some information about 6205 since it was mentioned in this thread. This is a terrible WiFi card, at least for linux. In recent kernels TX AMPDU aggregation is disabled which causes a severe drop in performance. You'll see much better results with "options iwlwifi 11n_disable=8" in /etc/modprobe.d/iwlwifi.conf.

Hm... I tried that with no improvements in performance (my N connection was around 55mbit/s, as in posts above).

I see, thanks for the link!

You reloaded the driver/rebooted your laptop after the change, right ? I used to have 6205 earlier, and I was able to get slightly better speeds than the 55 Mbps that you are seeing, but it wasn't too much better. I think 100 is your absolute max.

My bad! I thought it was related to the router in question:-). I got 85 mbit/s now so it really does make a difference, thanks! Now I only need the router itself to get fixed:-).

Blogic recently uploaded a commit to his staging tree which could solve the problem with mt7621 devices like the DIR-860L B1. If you want to test a build which incorporates the commit and SQM QoS try my updated build here.

Since this thread diverged from its original topic quite a lot, I will just note for posterity my latest struggles with wifi speed: turns out it can have quite a huge impact if you have a 20cm thick concrete wall in your apartment! Also, if your router has external antennas, it pays to fiddle with them and try which positioning works best (I think they should point to various dimensions optionally, i.e. into X, Y, and Z axis of three dimensional space). I am now using TP-Link Archer C7 v2 and facing the router, I was able achieving 150 mbits over N (router was set to AC and 80 Mhz band, but I gather the wifi card in my laptop is only able to use N and 40 Mhz). When the laptop is on my desk, I was able to get around 100mbits with correctly positioning the antennas up from 60mbits when they were all pointing into one direction. Moving the router by one meter (so there are no plasterboard walls in the way, just the concrete) improved the speed by about 20 mbits. Wifi is really sensitive to positioning, and this is all in a 86 m2 appartment, not a big house!

Though, back to original topic, since the speed of my connection varies with the position of my laptop in the appartment, so should probably vary the SQM limits. Now I have it set up to 142 mbit for the speed of my upstream link, which is 150 mbit. However, it means the wifi itself is not debufferflowed:-(. Now I get a B on DSLreports, which I guess is still decent, I guess.

was this ever resolved?

As I said, I started to use another router, so i do not know about the orignal issue with the DL router. I am also now on a newer laptop and with wifi ac, i comfortably reach the 150mbit I get from my ISP.