[Resolved] D-Link DIR-860L - mt7621 - 5GHz WiFi issues - OpenWrt 19.07.5

Hi,

a few days ago I helped a friend with installing OpenWrt 19.07.5 on a D-Link DIR-860L B1 router and encouraged him to use the 5 GHz network while in close proximity to the router. The 5GHz clients are 100% Samsung smartphones (J7/A51/S9/etc.).
Today, after an uptime of 4 days, I noticed the following in the kernel log, repeating randomly at least 10 times during this period of 4 days:

[168941.557056] mt76x2e 0000:01:00.0: MCU message 31 (seq 3) timed out
[168941.618638] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00
[168941.629762] mt76x2e 0000:01:00.0: Build: 1
[168941.638133] mt76x2e 0000:01:00.0: Build Time: 201507311614____
[168941.667043] mt76x2e 0000:01:00.0: Firmware running!
[168941.687162] ieee80211 phy0: Hardware restart was requested
[168981.556718] mt76x2e 0000:01:00.0: MCU message 31 (seq 8) timed out
[168981.618260] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00
[168981.629403] mt76x2e 0000:01:00.0: Build: 1
[168981.637772] mt76x2e 0000:01:00.0: Build Time: 201507311614____
[168981.666716] mt76x2e 0000:01:00.0: Firmware running! 

Apparently it only affects phy0 (5GHz) and there are a few clients 5GHz-able that are connecting to the router. No one noticed any disconnects, but the "Hardware restarts" are definitely causing a short disconnect.

Did some research and found the following related links (not really helpful):



Another link I can't find anymore was also describing an issue with MCU messages timing out, but related to an Intel driver (on a full fledged 4.x kernel Linux) and the workaround there was to disable the 80211.n support.

The relevant OpenWrt wireless configuration section is:

config wifi-device 'radio0'
        option type 'mac80211'
        option hwmode '11a'
        option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
        option htmode 'VHT80'
        option txpower '20'
        option country 'US'
        option channel '36'

And I'm using the exact same settings on 3 other D-Link DIR-860L B1 routers, running different OpenWrt 18.06.1-19.07.5 versions, where I haven't noticed any issues. But, on those other routers usually there is only one 5GHz (capable) client connecting.
I don't know how to replicate this or what is causing the issue, but I fear that either - if there are many 5GHz clients, causing traffic, the adapter becomes unstable - or - there is one client that "offends" the adapter with some weird wireless traffic/packets (phy level).

Did some more research and learned that I could enforce some client 802.11 version requirements with require_mode

  • added to the configuration section from above:
    option require_mode 'ac'
  • restarted the router and was not able to connect anymore on both 5GHz and 2,4GHz (why?)

From this "defect" - I learned that one can disable the 802.11n by simply using htmode NOHT (not available in LuCI):

  • not sure if it's actually supported and if it'll help (the workaround was presented for the Intel driver and it was really old).

I only have one 5GHz capable client (my phone) ATM and haven't had absolutely any issues with it connecting to and using my D-Link DIR-860L B1 and I'm pretty much staying on 5GHz all the time.
Maybe some other users that use more 5GHz clients at the same time could have a look in the kernel log and check if they're getting such errors.

Any other helpful inputs/explanations appreciated!

This is unfortunately a recurring problem with the DIR-860L B1, the request to reset the hardware.

Some suggest lowering the transmit power may help, but haven't seen any conclusive reports about that.

See:

1 Like

Thanks for the confirmation and the extra details provided.

I'll set this thread on resolved since it's a duplicate of the already available (github) reports, that are being currently tracked.

The only D-Link DIR-860L B1 where I observed the reported problem has been rebooted and more than 24h have passed now without any issues. It looks like a reboot helps, maybe I'll write a short workaround script (cron - periodically launched) to grep the kernel log and reboot the router if HW restarts are found.

I'm able to observe 4 such (identical - manufacturing date & same stock firmware version) D-Link DIR-860L B1 routers, 3 directly and one belonging to a friend of mine, and will definitely report if I'll be able to find something that could help in understanding/resolving the issue.

1 Like

I dug a little more into this mt76 driver issue and learned that it all started with a driver update between 18.06.1 - 18.06.2 and the last OpenWrt that is actually OK si 18.06.1.
Also, although I reported the issue with the 5GHz WiFi, most of the other reports are related to 2,4GHz and my understanding now is that the issue occurs not because of the network traffic intensity, but because of the clients amount and the radio traffic/auth/handshake/noise generated.
Out of the 4 D-Link DIR-860L B1 routers I mentioned only 2 are actually serving many wireless clients. The one for which I created this thread is running OpenWrt 19.07.5 and is experiencing the WiFi HW Reset issue and another one that I installed in a small office (some years ago) and haven't had the chance to upgrade is still running 18.06.1, usually has an uptime of weeks, and has absolutely no issues with the WiFi (clients (many at the same time) are mainly using the 2,4GHz band).

These users are pinpointing the start of the wireless driver issue between 18.06.1 - 18.06.2:
https://www.gitmemory.com/issue/openwrt/mt76/246/466520975

And the actual OpenWrt github commit - mt76 driver update is f34ad1a8f0d1ee4a3e4b966d58c8f3b8a523a417 - (Jan 14, 2019 ):

This commit apparently caused the driver to flood the kernel log with "MCU message XX timeouts"
And then in some more recent mt76 driver updates, there was a lame bushfix for the driver, resetting the hardware if the MCU message timeouts started to appear.
18.06.1 was released on Aug 16, 2018

18.06.2 was released on Jan 30, 2019 (16 days after the first mt76 buggy driver commit)

Mainstream kernel devs talk on the mt76 driver MCU message timeouts - back in 2018:
https://patchwork.kernel.org/project/linux-wireless/patch/20180711122947.GA5309@redhat.com/
And the suspected bushfix patch, resetting the HW - 2019:
https://patchwork.kernel.org/project/linux-wireless/patch/20190226092945.54738-5-nbd@nbd.name/

I'd suggest to revert the mt76 driver to a state before the commit f34ad1a8f0d1ee4a3e4b966d58c8f3b8a523a417 - (Jan 14, 2019 ), but then there were many other subsequent driver improvements (updates) and not sure how important/beneficial these actually are.

Additionally, suggest the mt76 kernel devs to test the driver modifications in a more busy WiFi scenario - maybe point them to this thread (I'm neither an OpenWrt maintainer - nor registered on github/kernel.org).
https://wireless.wiki.kernel.org/en/users/drivers/mediatek
Developers & Support

Send patches to the people and mailing lists below:

To: Felix Fietkau nbd@nbd.name
To: Lorenzo Bianconi lorenzo@kernel.org
To: Ryder Lee ryder.lee@mediatek.com
Cc: linux-mediatek@lists.infradead.org - https://lists.infradead.org/mailman/listinfo/linux-mediatek
Cc: linux-wireless@vger.kernel.org - http://vger.kernel.org/vger-lists.html#linux-wireless

IRC channel: #mt76-devel

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.