23.05-rc2 Netgear WAX206 (mt76) sporadic no traffic but not DFS log entries

(Edit: See update post below. Not DFS related, reproduced on non-DFS channel, apparently fixed with a recent mt76 patch still present with master snapshot)

Running 23.05-rc2 on a netgear wax206. Had a 5GHz AP configured for channel 120, 40MHz, ax. Country is configured as US.

It had been performing without issues up until 1-2 days ago, possibly around the time I switched from a previous 23.05 snapshot to -rc2.

The sporadic issue since then is that I appear to lost connectivity on a client of this AP, for about 1 minute, (very) occasionally. When I wait the ~1 minute, or when I manually force the client to reconnect to the same AP, connectivity is restored.

I thought of DFS, but the only related entries I can find are from hours before I saw this most recently - in fact this is from the time the unit was rebooted this morning:

# logread |grep -i dfs
Sat Jul  1 11:45:48 2023 daemon.notice hostapd: wl1-ap0: interface state HT_SCAN->DFS
Sat Jul  1 11:45:48 2023 daemon.notice hostapd: wl1-ap0: DFS-CAC-START freq=5600 chan=120 sec_chan=-1, width=0, seg0=118, seg1=0, cac_time=60s
Sat Jul  1 11:50:56 2023 daemon.notice hostapd: wl1-ap0: DFS-CAC-COMPLETED success=1 freq=5600 ht_enabled=1 chan_offset=-1 chan_width=2 cf1=5590 cf2=0
Sat Jul  1 11:50:57 2023 daemon.notice hostapd: wl1-ap0: interface state DFS->ENABLED

Seems reasonable at boot/AP start time.

Such a "silent traffic drop" of about 1 minute happened hours later. I can't see anything relevant in either the system log or dmesg.

I have since (i.e. after this last "1 minute outage") switched the AP to channel 44, haven't experienced this issue since in the past few hours but it has been sporadic before anyway.

Am I right to be suspecting DFS here?

So, I had been facing this for a while and did some troubleshooting. I think I see a pattern and it may be related to the mt76 (removal of) 160MHz capability for MT7915, detailed in this long thread.

Motivated by that thread I started switching my main client device to this WAX206 to AC mode. It's a linux laptop with Intel AX210, so I disabled AX on it using modprobe iwlwifi disable_11ax=1 and verified that with that module argument the client connected in AC mode, even if the AP was still offering AX.

When the client connected in AC mode, I no longer experienced these no-traffic incidents. Switching AX back on in the client, and I could soon enough (might take minutes, might take hours) see the problem again.

Referencing the above thread again, once the patch to remove VHT160 from MT7915 was merged to master, I built an image for the WAX206 (local imagebuilder) and flashed it: r23566-37ff916af7.

I let the client connect without inhibiting AX any more, and for a day now it's been working without issues.

Hope the VHT160 patch makes it to 23.05 RCs/release.

Current state:

From the WAX206:

# iwinfo wl1-ap0 assoclist
nn:nn:nn:nn:nn:nn  -54 dBm / unknown (SNR -54)  0 ms ago
	RX: 573.5 MBit/s, HE-MCS 11, 40MHz, HE-NSS 2, HE-GI 0, HE-DCM 0     22372 Pkts.
	TX: 390.0 MBit/s, HE-MCS 8, 40MHz, HE-NSS 2, HE-GI 1, HE-DCM 0     40321 Pkts.
	expected throughput: unknown

From the linux AX210 STA:

$ iw dev wlp170s0 link
Connected to nn:nn:nn:nn:nn:nn (on wlp170s0)
	SSID: ...
	freq: 5600
	RX: 48230342 bytes (40885 packets)
	TX: 4032810 bytes (22025 packets)
	signal: -58 dBm
	rx bitrate: 275.2 MBit/s 40MHz HE-MCS 5 HE-NSS 2 HE-GI 0 HE-DCM 0
	tx bitrate: 573.5 MBit/s 40MHz HE-MCS 11 HE-NSS 2 HE-GI 0 HE-DCM 0

	bss flags:	short-slot-time
	dtim period:	2
	beacon int:	100
1 Like

Update: I still got this a couple of times even with r23566-37ff916af7. I'm building a 23.05-snapshot-r23287-b28d74090f image to try next.

Similar sporadic connected but no traffic problem bothered me for a long time.

My WiFi AP is D-Link DIR-860L B1, also mt76 driver. For me it happened not just 5G, but also 2.4G, so not DFS related.

Eventually I found the issue gone after I set these supposedly default values explicitly:

        option disassoc_low_ack '1'
        option skip_inactivity_poll '0'
        option max_inactivity '300'

Just hope you can try it and see what happens.

BTW, you should set channel to non-DFS to eliminate the possible fault of DFS.

Interesting, thanks for the heads up. I've added those config options manually.

Re: DFS, I had switched off DFS and still encountered this previously. Also DFS channels are too tempting at my location, they are empty, unlike the ones below and above them.

Well, no luck with 23.05-SNAPSHOT r23288-476bf135fc - had another incidence of this a few hours after flashing that updated 23.05 snapshot.

I'm still test-driving/shaking down the WAX206 before it replaces the main house router, a R7800. So there's one other device using it, a Macbook Pro M1 on which so far I haven't noticed this.

So it's possible I'm seeing an issue that's somewhat specific to the combination of my Linux laptop and its Intel AX210 plus the WAX206/MT7915E.

Hi , I also had a WAX206 with a similar issue: Soon after booting, the device looses all connectivity for 30s-2min. I fully disabled the firewall, the issue still persists. The connectivity losses returned every 5min, so the device is unusable. I sent it back. The issue doesn't seem to happen on stock firmware though.

The pattern is different in my case. First of all it's far less frequent. Also, as far as I have been able to tell, layer 2 (WLAN) remains connected. For whatever reason, traffic is not being fully relayed to layer 3.

It's also important to note that my WAX206 is configured in a slightly less common way: It's a dumb AP, with 5GHz in AP mode, bridged to a WDS client on the 2GHz side. No DHCP service on the layer 3 interface (DHCP client) that runs on the bridge.

My most recent incident was seeing WLAN associations behaving "normally" (as far is it's possible to tell), but IP traffic, including DHCP, both for the AP itself and 5GHz clients, not being relayed through the bridge to the upstream-facing client interface. That same traffic was appearing on the 5GHz AP interface though.

I'm leaning towards something switch/DSA related but I just don't have enough data to tell for sure.

Well isn't this interesting, just happened right as I was typing my previous message.

I was able to ssh and also access the luci interface through a separate, fallback layer 3 interface I have configured on the router (runs on its own VLAN on the wired ports and on the 5GHz AP)

The router was still showing my laptop as associated (and hostapd logs show there was never a disassociation anywhere close to the incident). Once I restarted the DHCP client interface from luci (not the fallback one - this is layer 3 on the same bridge VLAN used for "default" traffic bridged with upstream), bridged traffic from my laptop (from the WLAN) started flowing normally again.