Enabling SQM causes bufferfloat to go from B to C on a cable connection

moeller0 · March 9, 2017, 9:55pm

Good, then the issue is not simply too little CPU resources left.[quote="felagund, post:18, topic:2081"]
But it sometimes reports full 300mbits and sometimes go to numbers around 130 - it sometime changes from second to second. The router shows in network/wireless for tx/rx column this:

300.0 Mbit/s, 40MHz, MCS 15, Short GI
270.0 Mbit/s, 40MHz, MCS 14, Short GI

or this (again it changes frequently and this reporting is more common:

6.0 Mbit/s, 20MHz
270.0 Mbit/s, 40MHz, MCS 14, Short GI

I am sitting one metre from the router with only air between it and my laptop. Also, I downloaded linssid and it shows I am one of the only two stations transmitting in 5ghz in my building the other signal being weak and using channels far from mine.
[/quote]

I believe it is quite normal for these values to fluctuate, for one thing thet depend on the actual "air quality" and also I seem to recall will the linux kernel occasionally probe other rates than the maximum to get a better idea which MCS mode offers the best throughput/latency tradeoff (not exactly sure whether the kernel actually cares for latency).
The 1m distance however might not be ideal, maybe try like 2 to 3 meter away (still with line of sight); that might not help but it could, if say at the short distance the ADC gets saturated (not an engineer myself, so this might be wrong) of the antennas have an unfortunate radiation pattern at short distances.

According to the data rates table in https://en.wikipedia.org/wiki/IEEE_802.11n-2009 MCS14 and MCS15 are clear indicators that two spatial streams are used which ISTR require the use of two antennas at the receiver?

I would think so, but I have never used the mediatek wifi chips so have no idea about what to expect.

felagund · March 10, 2017, 12:09pm

I just tried with about three metres from the router and it is still around 60 mbit for seconds.

Well, somebody on the wiki here https://wiki.openwrt.org/toh/d-link/dir-860l claims this: "A quick test using iperf3 and an Atheros AR9382 WLAN-card showed about 170 mbit/s using the 5GHz radio (11n)."

I will try to get another laptop with a newer card to try if the speed issue is not specific to my laptop.

felagund · March 10, 2017, 12:15pm

I just tried a newer windows laptop that got around 85 mbits over wifi, so I guess those 50-60 I am getting on my are due to its age or something like that.

moeller0 · March 10, 2017, 12:36pm

Ah, okay, you had the right hypothesis then. The bufferbloat you observe then seems to be the wifi induced bufferbloat. ATM ath9k seems to have the most thorough de-bloating job done, followed by ath10k(?). I always thought that mediatek wifi was a close 3rd, but it seems that at least your's is not as debloated as it should be. Especially for downstream/downloading tests, it is the AP that needs to be debloated, and your results seem to indicate the the 860-L needs more de-bloating... I guess that also means I am officially out of my league, I will holler if I get a better understanding of the sqm-scripts issue with switched WAN ports...

psyborg · March 10, 2017, 12:57pm

it should handle >100Mbps easily, however there could still be some problems due to bugs like this https://github.com/openwrt/mt76/issues/84

also keep in mind that i got much better speeds with iperf on a patched ubuntu 10.04 than with ubuntu 16

dlang · March 10, 2017, 5:35pm

50Mb/s is quite reasonable for 802.11n using a single 20MHz channel (theoretical
is something like 64Mb/s, combining both sides of the conversation.

newer hardware can support 40Mhz channels on n, and -ac can go much higher.

JonP · March 10, 2017, 6:56pm

It's worth mentioning that in crowded neighborhoods, using 40mhz might be SLOWER than 20, due to you are operating on 2 wifi channels instead of one. In a busy airspace, you might run into more interference using 2 and end up lower in total thruput. Checking which channels are clear and staying away from the busy ones is always worth the effort.

moeller0 · March 10, 2017, 8:53pm

Hi David,
according to http://www.intel.com/content/www/us/en/wireless-products/centrino-advanced-n-6205.html?wapkw=6205 the wifi card should manage up to 300Mbps (two streams at 40MHz, 2 streams at 20Mhz should still give >= 130 Mbps raw bandwidth). Now wifi is not terribly efficient, is half duplex in nature, but with his AP reporting MCS14 and MCS15 and his wifi card supporting those rates, 45Mbps seems still low (I thought the rule of thumb is wifi looses 50% not 100-100*45/130 = 65.38% ).
But unlike you I lack the real-world experience you have from designing, setting up and managing the wifi at SCALE, so I fully believe your description and would be quite thankful if you could elaboarte a bit more how much measurable good-put one can expect from typical wifi.

dlang · March 10, 2017, 10:57pm

WOn Fri, 10 Mar 2017, moeller0 wrote:

... would be quite thankful if you could elaboarte a bit more how much
measurable good-put one can expect from typical wifi.

In this case, I didn't realize that he was using a AP and device that were
both supporting two streams, I thought he was using basic 802.11n (which I
believe only supports one stream), so if it should have supported 2, it was a
little low, as he saw when he switched to a different device, but not
unreasonable.

The problem with predicting good-put is that there are just so many variables.

Just looking at airtime useage at high data rates and you will see that the wifi
headers per timeslot overwelm the data transport.

IIRC, you have to transmit somewhere around 8K of packets to equal the airtime
used by the overhead, so transmitting 8K takes twice the airtime of transmitting
64 bytes

When you are doing a speed test, you are transmitting a lot of data in one
direction, and only acks in the other direction, each ack that gets transmitted
separately eats up a lot of airtime that could be used by data, so the ability
of the endpoint OS to thin out acks and batch them makes a huge difference. As
do details of the wifi drivers and buffering in the endpoint.

If you add bufferbloat monitoring to a raw speed test, you are sending
additional packets to measure the latency, and that can have a huge effect on
the number of transmit slots the endpoint needs to use.

Because of this, even a trickle of other data on the wifi network can have a
significant effect on the measured speed. If you have devices that are
generating any broadcast traffic (Alexa looking for devices to manage, DNLA
servers advertising themselves, many IoT things, etc), that can eat up valuable
transmission slots that are now not available for your speed tests[1].

One of the recent improvements in LEDE is the introduction of Airtime Fairness
into the ath10k drivers. This prevents any one station from hogging too much
airtime. This does wonders at limiting the damage that can be done by a single
small station[2]. But a side effect of this is that it will tend to put a lower
limit on the size of any single data transmission. I believe that I've seen that
some drivers would send up to 96K of data in a single transmission (up to 64k
1500 byte packets), and this will be trimmed down a bit, slightly lowering
good-put, but greatly improving latency.

An additional factor is that wifi tries really hard to play nice with other
stations. When a station is preparing to transmit, it checks to see if the
channel is clear or if something else is transmitting. As a result of this, even
a weak station that's transmitting on a channel you are using can eat an airtime
slot, even if you are right next to the AP and have an overwelmingly powerful
signal[3]

The delays that you can suffer on wifi will look like bufferbloat to soemthing
like dslreports. As a result, you should really do your bufferbloat tests
separately from wifi tests.

Because of all these things, wifi good-put can easily swing by far more than the
15% difference noted here without you having any idea why (unless you are doing
detailed captures of the RF environment at both the AP and the endpoint and then
analyze them carefully), unless you happen to be out in the boonies where you
don't have anyone else around

doing wifi speed tests is good for comparing different wifi configs (different
channels, different antennas, etc. But unless your connection is very bad, it's
not really going to tell you much about the bufferbloat in your wired devices.

I really wish I had a good way of predicing good-put, but mostly it boils down
to 'wifi is worse than you ever imagined', most of the numbers that people are
concerned about are not that unreasonable, they may be able to be improved, but
unless you do a RF analysis, or show that a different combination works much
better in that area, unless you are using multiple channels or -ac with multiple
streams, good-put in ballpark of 40Mb/s in a speed test isn't shabby

David Lang

[1] This is a really good argument for routing between 2.4G, 5G, and wired
subnets instead of bridging them. Most IoT things still tend to be 2.4G only

[2] prior to this, Linux would try to send the same amount of data to each
station, and if one is 1000x slower than the others, it wold get 1000x the
airtime. With these changes, they all get roughly equal airtime so the station
that's 1000x slower will get 1000x less data throughput instead. And in high-end
-ac environment, the difference between the fastest and the slowest station can
literally be 1000x, but even in 802.11n setups you can get to 100x

[3] This has the perverse effect that if you have better antennas on your AP,
your throughput will decrease because it is better at hearing unrelated stations
in your area. Sometimes just putting something metal between your AP and the
direction of an interfering station can greatly improve conditions for both
groups. Similarly, going to wider channels means that there are more things out
there that can clobber you, so unless you are in a fairly quiet RF enviornment,
and have only a small number of stations, you are probably better off using more
APs, each on a single channel, than using wider channels.

moeller0 · March 11, 2017, 12:15pm

Hi David,

thank you very much for this detailed description. It sounds that it is actually a wonder that wifi works as well as it does, typicallly

Best Regards

dlang · March 11, 2017, 12:34pm

thank you very much for this detailed description. It sounds that it is actually a wonder that wifi works as well as it does, typicallly

In many ways you are right. The basic design was put in place when it was
expected that wifi would be rather rare (APs >$1000, laptop adapters $700 each).
New generations of standards have tweaked things, but there has been the
requirement to be backwards compatible and to work in a completly unregulated
environment with zero coordination between different people's devices in the
same area, and the results are not conductive to maxing out the capabiltiies
efficently.

in high density environments it gets even uglier, because the assumption is that
if the transmission didn't get through, it must be due to a weak signal, not
interference, so the default is to slow down, which just makes it likely to
generate more interference, causing more stations to slow down (this is why
crowded wifi environments seem to work for a while and then performance falls
off a cliff), the make-wifi-fast project is doing wonders to improve this
behavior from the AP side

David Lang

felagund · March 11, 2017, 1:24pm

Thanks for the enlightening discussion! It explains that I am getting only about 20 mbits now since there is an old IPhone slowing things down now. However, the weird thing is that normally (as in all the previous tests) I am the only one transmitting on my channel in the 5ghz band ( I scanned from my laptop and the AP and found only one other weak station using a channel very far away from mine).

I dug up this on the OpenWRT forum: https://forum.openwrt.org/viewtopic.php?pid=322894#p322894 - they say something about OpenWRT 15.0.1 not being able to handle congestion (unlike DDWrt) but I was not able to reproduce their problem so far.

Borromini · March 11, 2017, 1:39pm

For what it's worth, bufferbloat scores on my (wired) desktop are A+ while on my (wireless) laptop it doesn't get any 'better' than A. This is with an Intel 8260 AC card.

dlang · March 11, 2017, 1:47pm

Thanks for the enlightening discussion! It explains that I am getting only
about 20 mbits now since there is an old IPhone slowing things down now.
However, the weird thing is that normally (as in all the previous tests) I am
the only one transmitting on my channel in the 5ghz band ( I scanned from my
laptop and the AP and found only one other weak station using a channel very
far away from mine).

in that case, you should be able to be getting much closer to the rated speed.

I dug up this on the OpenWRT forum:
https://forum.openwrt.org/viewtopic.php?pid=322894#p322894 - they say
something about OpenWRT 15.0.1 not being able to handle congestion (unlike
DDWrt) but I was not able to reproduce their problem so far.

15.0.1 is old enough to not be worth troubleshooting any longer There are
lots of tunables, that could be different, and since openwrt, lede, and ddwrt
all run backported drivers, there is a very real chance of the drivers being
different

felagund · March 11, 2017, 2:31pm

No, I am getting 20mbits when the IPhone is in the appartment. When I am alone here and only my laptop is connected, I get those 60 mbits I was reporting above. But at least now I know ot tell the IPhone user to connect to the 2,4 Ghz network so I am not slowed down.

felagund · June 28, 2017, 9:10am

Actually, after a while of having SQM turned on, I had to turn it off because it would prevent clients from connecting. It was a hit or miss, but it would happend several times a day that somebody would come home and their device would not get connected (and the router had to be restarted, which mostly helped, but only for a short while). So now it is turned off again:-(.

moeller0 · June 28, 2017, 9:31am

I fear your are not the only one having problems with traffic shaping on 860-L. Unfortunately it is not clear where exactly the problem lies, but it can be triggered by instantiating traffic shaping. It is not only sqm-scripts, but also qos-scripts that can trigger the router's susceptibility, not that helps you but see:
https://forum.openwrt.org/t/optimized-build-for-the-d-link-dir-860l/948
for more details...

Best Regards

buckaroo · June 28, 2017, 9:53am

Just some information about 6205 since it was mentioned in this thread. This is a terrible WiFi card, at least for linux. In recent kernels TX AMPDU aggregation is disabled which causes a severe drop in performance. You'll see much better results with "options iwlwifi 11n_disable=8" in /etc/modprobe.d/iwlwifi.conf.

felagund · June 28, 2017, 2:37pm

Hm... I tried that with no improvements in performance (my N connection was around 55mbit/s, as in posts above).

felagund · June 28, 2017, 2:37pm

I see, thanks for the link!