SQM for WLAN, need seperate settings

Here's my test result.

@dlakelan, still applying sqm to wlan interfaces would speed up surfing, FaceTime quite a bit, I am going to keep it this way, at least it cannot get things worse.

1 Like

For bufferbloat, I might be confused.... is the advice here NOT to apply SQM to the "WAN/WAN6" but to the "wlan0"? I currently have SQM applied to "WAN/WAN6" and nothing else.

Not exactly. What I am doing is applying SQM to WAN/WAN6 on egress direction, and apply extra SQM to both wlan0 and wlan1. Testing shows wlans connection will get bloated without the extra SQM.

I am not surprised. I was able to shape a bit over 700 Mbit in my testing with iperf3 on my mt7621 device with one computer connected to WAN, and another on LAN. NAT and firewall were both enabled. It did gigabit speeds in the same configuration with SQM turned off with CPU cycles to spare. That was way before hardware or software flow offload was even a thing. Even back then my tests with met with skepticism. Glad somebody managed to reproduce those :smiley:

As for your WiFi bloat issues, make sure you use 19.07.0 or a snapshot, since those are currently the only versions with airtime fairness enabled. Still, as the shaping bottleneck was a little bit over 700 Mbit in my testing, with WiFi thrown into the mix you are probably running out of CPU cycles if you have a 700 Mbit WAN connection.

1 Like

Impressive, this test really looks decent...

A little bit confusing here, since the test with WiFi connection never exceed much more than 200 Mbit, how should the cpu cycles drained already?

Any bufferbloat testing result with WiFi connection on 19.07.0 or main branch, with SQM only applying to WAN/WAN6?

For now I might need more convincing facts for me to consider upgrading.

managing the wifi hardware requires a bunch of cpu cycles on top of the packet management. So I'd expect your wifi CPU saturation to be considerably lower than your wired CPU saturation... Still If you can handle 700Mbps wired, then I'd expect you to do 300 or 400 at least by WiFi...

Remember though that your WAN could receive 700 and then the queue on the wifi throws it away, but you still have to process the 700. That shouldn't happen for long periods, but it could spike like that.

That totally make sense, since the remote server never know your're access through WiFi.

Well, airtime fairness is quite an interesting feature. By default WiFi does/did throughput fairness, so one slow link to say an old device with only 802.11b or a modern device quite far away could severely decrease the aggregate throughput of the wifi network, as the slow devices got way more airtime. Since the contended respurce in wifi is not so much throughput but rather transmit opportunities or "airtime" it makes sense to not try to dole out equal "throughput" to all devices but rather share airtime fairly, which in effect will cause less of an aggregate throughput loss AND will also in general decrease the latency (as slower devices will not hog the airtime for long durations any more if there are other stations trying to receive data). In short airtime fairness is quite an interesting feature, which might even make the sqm instance on your wlans superfluous (as Linux airtime fairness implementation works by implementing fq_codel inside the wifi driver, and in that situation, no additional traffic shaper is required). So if 19.07 actually offers working airtime fairness for the 860, that would for me be a strong reason to update (in addition to the fact that 18.06 will be on the way out). But that obviously is a policy decision everybody needs to take for their own network.

If I remember correctly, airtime fairness only apply to 2.4G WiFi, or I missed something?

Edit: OK, did some research and found the airtime fareness patch was moved to mac80211, to make it usable from other drivers, on Thu, 22 Feb 2018, see.

I guess this means airtime fairness is not restricted to ath9k now. Just need to prove if it's work as good as it supposed to be.

I do not think that the airtime fairness patches are limited to the 2.4 GHz band, but I also believe that they where already added in OpenWrt 17. So we might be talking about the new code for ath10K here, which certainly does not apply for your router, BUT I only have first hand experience with ath9K, so @Mushoz probably knows more about the mediatek wifi...

Airtime fairness was moved to mac80211, but it still requires drivers that are compatible with the functionality. For mt76, support for airtime fairness is only present in 19.07.0 or recent snapshots.

Are you still on the 18.06 branch? If so, you should definitely consider upgrading!

1 Like

Sorry, I don't know about airtime fairness in drivers/mac80211.
But it does use a fq_codel like implementation?
What are the settings used there?
target 35ms interval 150ms?

Currently I use:

tc qdisc replace dev wlan0 parent 1:1 handle 110 fq_codel limit 8192 quantum 512 target 35.0ms interval 150.0ms memory_limit 4Mb noecn
tc qdisc replace dev wlan0 parent 1:2 handle 120 fq_codel limit 8192 quantum 1024 target 35.0ms interval 150.0ms memory_limit 4Mb noecn
tc qdisc replace dev wlan0 parent 1:3 handle 130 fq_codel limit 8192 quantum 1024 target 35.0ms interval 150.0ms memory_limit 4Mb noecn
tc qdisc replace dev wlan0 parent 1:4 handle 140 fq_codel limit 8192 quantum 512 target 35.0ms interval 150.0ms memory_limit 4Mb noecn

For the limit setting, I'm not sure...
mwlwifi defines the limits as:
#define MAX_NUM_TX_DESC 1024
#define MAX_TX_RING_SEND_SIZE (4 * MAX_NUM_TX_DESC)
#define PCIE_MAX_NUM_TX_DESC 256
#define PCIE_TX_QUEUE_LIMIT (3 * PCIE_MAX_NUM_TX_DESC)

So 1024, 4096 or 768?

//edit
// settings taken from
http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf

Currently, we have tuned the default value of FQ-CoDel’s target parameter to 35 ms and the scheduling interval to150 ms. These are very conservative settings that behave well for all tested link conditions, and yet are still very effective.

I believe target is 20ms for wifi (due to the burstiness of the MAC and potential wifi aggregate sizes), no idea about the interval. But the trick is to integrate fq_codel into the wifi stack and make the driver tself only queue up ~2 aggregates worth of packets, so that fq_codel really controls the relevant queue...

See https://www.usenix.org/system/files/conference/atc17/atc17-hoiland-jorgensen.pdf for details....

The airtime fairness that got moved to mac80211 got into the mainline kernel 5.5 by i.e. the following commits:
mac80211: Implement Airtime-based Queue Limit (AQL) and
mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue

Maybe AQL can be backported to the kernel for the next release (4.19 or 5.4).

hostap 2.9 supports airtime policy configuration (see hostapd: Add airtime policy configuration support). These configuration options are AFAIK not (yet) exposed to the OpenWrt configuration and by default hostap does not use airtime fairness.

Just tried 19.07.0, but I am quite disappointed in terms of bufferfloat of WiFi. Firstly, the bufferbloat is still there. And furthermore, I cannot re-create the same solution as of with 18.06.x anymore.

I've managed to disable airtime fairness by command (default value was 3)

echo 0 > /sys/kernel/debug/ieee80211/phy0/airtime_flags
echo 0 > /sys/kernel/debug/ieee80211/phy1/airtime_flags

and then reboot the router. After disabled airtime fairness and reapply the same settings of 18.06.x to 19.07.0, testing showed high ping time during switch from downloading test to uploading test, which wasn't happen with 18.06.x.

I am not sure it's quite right direction for OpenWRT to become more sophisticated yet more and more lacking the opportunity for users to dig into the configurations and much flexible solutions would been found.

What does
cat /etc/hotplug.d/net/20-smp-tune
show and what
cat /sys/class/net/eth*/queues/rx-0/rps_cpus
and
cat /sys/class/net/eth*/queues/tx-0/xps_cpus
?

20-smp-tune as follow:

#!/bin/sh
[ "$ACTION" = add ] || exit

NPROCS="$(grep -c "^processor.*:" /proc/cpuinfo)"
[ "$NPROCS" -gt 1 ] || exit

PROC_MASK="$(( (1 << $NPROCS) - 1 ))"

find_irq_cpu() {
        local dev="$1"
        local match="$(grep -m 1 "$dev\$" /proc/interrupts)"
        local cpu=0

        [ -n "$match" ] && {
                set -- $match
                shift
                for cur in `seq 1 $NPROCS`; do
                        [ "$1" -gt 0 ] && {
                                cpu=$(($cur - 1))
                                break
                        }
                        shift
                done
        }

        echo "$cpu"
}

set_hex_val() {
        local file="$1"
        local val="$2"
        val="$(printf %x "$val")"
        [ -n "$DEBUG" ] && echo "$file = $val"
        echo "$val" > "$file"
}

default_ps="$(uci get "network.@globals[0].default_ps")"
[ -n "$default_ps" -a "$default_ps" != 1 ] && exit 0

exec 512>/var/lock/smp_tune.lock
flock 512 || exit 1

for dev in /sys/class/net/*; do
        [ -d "$dev" ] || continue

        # ignore virtual interfaces
        [ -n "$(ls "${dev}/" | grep '^lower_')" ] && continue
        [ -d "${dev}/device" ] || continue

        device="$(readlink "${dev}/device")"
        device="$(basename "$device")"
        irq_cpu="$(find_irq_cpu "$device")"
        irq_cpu_mask="$((1 << $irq_cpu))"

        for q in ${dev}/queues/rx-*; do
                set_hex_val "$q/rps_cpus" "$(($PROC_MASK & ~$irq_cpu_mask))"
        done

        ntxq="$(ls -d ${dev}/queues/tx-* | wc -l)"

        idx=$(($irq_cpu + 1))
        for q in ${dev}/queues/tx-*; do
                set_hex_val "$q/xps_cpus" "$((1 << $idx))"
                let "idx = idx + 1"
                [ "$idx" -ge "$NPROCS" ] && idx=0
        done
done

and

root@OpenWrt:~# cat /sys/class/net/eth0/queues/rx-0/rps_cpus
e

and

root@OpenWrt:~# cat /sys/class/net/eth0/queues/tx-0/xps_cpus
2

Thanks, that is still the one active in 19.07 and 18.06 that seems to not be ideal for many multi-core CPUs...

That is not the output I asked for (but I did not really ask for what I wanted in the first place, so could you please add the following:

to get board id and OS version

egrep '"id":' /etc/board.json | sed -nE 's/(\s*"id":\s")(.*)(",)/\2/p'; egrep 'OPENWRT_RELEASE' /etc/os-release | sed -nE 's/(OPENWRT_RELEASE=")(.*)(")/\2/p'

to get the interrupt mapping

cat /proc/interrupts

to get the names of all network devices

ls /sys/class/net/

to get the receive side packet steering mapping for all network devices (please note the *, that the shell expands to all devices

cat /sys/class/net/*/queues/rx-0/rps_cpus

to get the transmit packet steering mapping

cat /sys/class/net/*/queues/tx-0/xps_cpus

For the gritty details see https://www.kernel.org/doc/Documentation/networking/scaling.txt. 20-tune-smp does seem to follow that documents advice in excluding the CPU processing the hardware interrupt, but that has issues for dual core CPUs as all non-IRQ processing is overloading the other remaining CPU, especially if the network NIC interrupts themselves can not be distributed among the CPUs.

@moeller0, I will try getting these info with 18.06.6 later. Is there anything that I can do to mitigate the issues for 2 core cpu like this router has?