How to write a meaningful mt76 lag bug?

My friend's kids were complaining about WiFi issues with the MT7621 based R6220 I gave her, so I gave her my Archer C7 and took this router back. It worked flawless for me two years ago, what could they be complaining about?

Now I get it. I'm experiencing Teradici problems working from home and gaming is a disaster. @lukasz92 is right, this is garbage and I want to take back my recommendations. Multiplayer gaming is no longer an option since switching to this router!

I don't want to just complain, I'd like to contribute. mt76 is the most open driver and the ath10k can handle my use case, so this should be fixable, right? I'm a windows C++ developer by trade and would be comfortable diving into this if I knew what a developer would want to see in a bug report. Would anybody like to direct me to properly identify the root cause of these issues? @nbd ?

1 Like

Hi @dana44,
What OpenWrt version are you using on this device?

OpenWrt 19.07.5 r11257-5090152ae3 / LuCI openwrt-19.07 branch git-20.341.57626-51f55b5

(since the forum software told me my post of just 19.07.5 was too short)

1 Like

Can you please try if running the latest snapshot makes things work better for you? You may need to reset the config when upgrading, so you should make a backup first.

I will try https://downloads.openwrt.org/snapshots/targets/ramips/mt7621/openwrt-ramips-mt7621-netgear_r6220-squashfs-sysupgrade.bin after work tonight and post my findings.

Thank you for such quick responses!

Success! I just finished downloading, flashing, and configuring the current version. I'll post as soon as I can, maybe not until Monday, if I can reproduce while using
OpenWrt SNAPSHOT r15237-fca0eb2d92 / LuCI Master git-20.339.75073-e54708a

@nbd I had a chance to run flent rrul -p all_scaled -l 60 -H netperf.bufferbloat.net -o mt76.png four times and produced four variations on image

I did notice the load was high:
image
so I disabled cake (the only modification from the stock firmware) to see if that would leave more CPU time.

That definitely made things worse:

I had a few images from running this in the summer, albeit IIRC with the router in a different home at probably the current official OpenWrt release at the time (19.07.3?). They definitely looked better:

Is any of this intriguing or even just unexpected? Is there anything I can do to identify what is going wrong?

I just pushed a bigger mt76 update to OpenWrt master and went through the commits that had accumulated since the last one (from September). Turns out there's an important fix in there for a regression that might produce symptoms like this.
If you build latest OpenWrt master (or wait for the next snapshot to appear), it might work better in your tests.
This mt76 commit is the one that matters, I believe:

4 Likes

I'm on the snapshot from a few hours ago now, but netperf.bufferbloat.net isn't up to reproduce my findings. I'll try again this evening when I need a break from gift wrapping.

Running last night's snapshot r15241-3ab695368a things are improved, but obviously still unusable for a gamer.

image
FWIW I had to use flent.newark.bufferbloat.net instead of netperf.bufferbloat.net for today's graph;
flent rrul -p all_scaled -l 60 -H flent-newark.bufferbloat.net -o mt76_dec19_6_nocake.png

@nbd What could I do next to be useful?

The fact that you're benchmarking over your internet connection to a third party server makes it hard for me to get anything useful out of the report.
Also, since turning cake on and off (which IIUC is only applied to the WAN connection) affects the results so much, it seems to me like the bottleneck is on the WAN side, and I don't know how much of this has anything to do with mt76.
Could you please run a netperf server on your LAN and repeat the tests?

I repeated the test with my D-Link DIR-860L:

  • netserver -D -d on a wired client
  • DIR-860L with OpenWrt SNAPSHOT, r15253-d6cb50c7ba
  • flent on a wireless client
for i in 1 2 3; do
    flent rrul -p all_scaled -l 60 -H 192.168.1.173 -o flent2g-0$i.png;
    sleep 3;
done

Tested on both 2G (CH 3, HT20, very noisy environment) and 5G (CH 100, VHT80).

I tested it 3 times, but they just don't correlate with each other. Is this flent a good measurement for this? :confused:


2G results:
flent2g-03
flent2g-02
flent2g-01


5G results:

flent5g-03
flent5g-02
flent5g-01

What kind of client did you use?

I finally got this configured to run a good test. This my laptop (Intel 9560 WiFi with current windows 10 drivers) to the r6220, which now has a wrt3200acm wired in to run the netperf server to test against. I ran a wired test from the laptop to the wrt3200acm as reference:
image

I did two tests only a few metres from the r6220 in the same room. The first run showed a notable dip in network traffic.
image
image

However when I went upstairs to the bedroom above the router when I work I see a less ideal graph
image
image

I hope that is able to demonstrate something useful or make somebody think of another thing that could be causing the times without traffic.

Also, since running OpenWrt SNAPSHOT r15241-3ab695368a / LuCI Master git-20.339.75073-e54708a my son's Google Pixel 1 has not been able to connect. I guess this is likely a general snapshot bug? Is this worth reporting elsewhere?

To work around the above pixel problem I enabled the WiFi on the wrt3200acm running 19.07.5. Since it was up and running again I decide to repeat the same tests. Note: the 3200 was configured as an AP and also happened to have the netperf server running on it which makes the test slightly different.

In the same room the latency and throughput was a sawtooth
image
image

In the upstairs bedroom it was the same story
image
image

I know the wrt3200acm is notorious for its terrible binary blob WiFi drivers, but this illustrates just how much better the mt76 driver is in these cases. Even in the same room a few metres away the ping can be over 1000ms! I wanted to share this measurement to celebrate the great work done on the m76. This ancient r6220 has no business outperforming the big wrt3200acm! This driver code is a small engineering marvel. You guys have done fantastic work. Thank you. I hope I'll be able to help.

Were those last tests with the r6220 using 5G or 2G? I'm asking because they're using different chipsets and different mt76 drivers.
Regarding the Pixel 1 - did you enable 802.11w or WPA3 in your wireless config? I'm just checking because those can cause compatibility issues with some devices.

Sorry, I missed that.

I used Intel Wireless-AC 9560 160 MHz with Linux 5.4.

I retest with Windows 10 + WSL.


WSL 2G resullts:
flent2g-wsl-01
flent2g-wsl-02
flent2g-wsl-03

WSL 5G results:
flent5g-wsl-01
flent5g-wsl-02
flent5g-wsl-03

They look better now.


I don't know what I messed up under Linux, maybe throttling problem. Still doesn't look good. :confused:
lshw:

product: Wireless-AC 9560 [Jefferson Peak]
driver=iwlwifi driverversion=5.4.0-58-generic firmware=46.6bf1df06.0

/proc/version:

Linux version 5.4.0-58-generic (buildd@lgw01-amd64-040) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #64~18.04.1-Ubuntu SMP Wed Dec 9 17:11:11 UTC 2020

2G results:
flent2g-03
flent2g-02
flent2g-01

5G results:
flent5g-03
flent5g-02
flent5g-01

2.4GHz is terrible. My son could not finish a match of the game he was trying to play (his win10 PC has an Intel ax200 device), so the testing needed to wait a bit. I was finally able to get time to do tests with the house WiFi. They reliably reproduce that 2.4Ghz is not good enough to enable.

Some 2.4Ghz tests from a bedroom a bit away from the router:
image
image
image
vs. 5Ghz in the same bedroom
image
image
image
during the last test I had to reconnect. But I shouldn't expect much from 5GHz when it needs to penetrate walls and ceilings, should I?

I then went and sat about three metres from the router:
image
image
image
odd, eh? Being in another room seemed better. Maybe the devices were too close?

None the less 5Ghz was much better than when I ran the 2.4GHz test in the same room.
image
image
image

I borrowed an archer c7 to get through the holidays. If anybody wants to see how the ath10k on 19.07.5 compares it does quite well:

image
image
image