22.03.0 is killing wifi chips?

I have 3 ZyXEL P-2812HNU-F1 boxes, which I'm using alternately. Each time a new OpenWrt version is released, I install it on a fresh box, install the settings from the previous one, and switch the boxes. This way I can easily revert if something nasty happens.

Some months ago I installed 22.03.0-rc1 on a box, and couldn't get the wireless to work. As far as the box knew it was working fine, but my devices didn't see the AP, nor could the box see other AP's. Then I installed 21.02 on it, and it showed the same problem. So some chip died. Can happen, the box is not the youngest anymore.

Now I have rc6 installed on another instance, and after an uptime of 19 days, my wifi disappeared. Sometimes my client sees the AP, but can't connect, but most of the time it's not visible. When I connect by cable I don't see any significant messages in logread. 'iw scan' doesn't show anything (while my client sees lots of other AP's). I tried a power cycle of the device, but that didn't cure the problem.

It could be coincidence, this box has about the same age as the other one which died, easily 10 years, but now I'm hesitating to install 22.03 on my last box.

Can software kill a chip this way? Has there been any other example in the past?

You'd get more pointers from looking at the logs, it very much feels like you're playing the guessing game now.

Which logs?

Like dmesg?

1 Like

Well, as said, as far as the box knows nothing is wrong.
dmesg:

[   36.909544] pci 0000:00:0e.0: [1814:3062] type 00 class 0x028000
[   36.914241] pci 0000:00:0e.0: reg 0x10: [mem 0xffff0000-0xffffffff]
[   36.924138] pci 0000:00:0e.0: BAR 0: assigned [mem 0x18000000-0x1800ffff]
[   36.930115] rt2800pci 0000:00:0e.0: enabling device (0000 -> 0002)
[   36.935882] ieee80211 phy0: rt2x00lib_request_eeprom_file: Info - Loading EEPROM data from 'RT3062.eeprom'.
[   37.221036] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 3572, rev 0223 detected
[   37.227423] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0008 detected
[   37.234498] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   52.859243] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2860.bin'
[   53.178717] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.40
[   53.809195] br-lan: port 4(wlan0) entered blocking state
[   53.813125] br-lan: port 4(wlan0) entered disabled state
[   53.819272] device wlan0 entered promiscuous mode
[   62.336147] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   62.341739] br-lan: port 4(wlan0) entered blocking state
[   62.346456] br-lan: port 4(wlan0) entered forwarding state
[   63.039780] br-guest: port 4(wlan0-1) entered blocking state
[   63.044026] br-guest: port 4(wlan0-1) entered disabled state
[   63.050575] device wlan0-1 entered promiscuous mode
[   63.100527] br-guest: port 4(wlan0-1) entered blocking state
[   63.104771] br-guest: port 4(wlan0-1) entered forwarding state
[   63.409693] br-guest: port 4(wlan0-1) entered disabled state
[   63.770050] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0-1: link becomes ready
[   63.775629] br-guest: port 4(wlan0-1) entered blocking state
[   63.780956] br-guest: port 4(wlan0-1) entered forwarding state

logread | grep hostapd

Sun Aug 28 14:19:40 2022 daemon.err hostapd: rmdir[ctrl_interface=/var/run/hostapd]: Permission denied
Sun Aug 28 14:19:40 2022 daemon.notice hostapd: nl80211: deinit ifname=wlan0 disabled_11b_rates=0
Sun Aug 28 14:19:42 2022 daemon.notice hostapd: Configuration file: /var/run/hostapd-phy0.conf (phy wlan0) --> new PHY
Sun Aug 28 14:19:42 2022 daemon.notice hostapd: wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE
Sun Aug 28 14:19:42 2022 daemon.notice hostapd: ACS: Automatic channel selection started, this may take a bit
Sun Aug 28 14:19:42 2022 daemon.notice hostapd: wlan0: interface state COUNTRY_UPDATE->ACS
Sun Aug 28 14:19:42 2022 daemon.notice hostapd: wlan0: ACS-STARTED
Sun Aug 28 14:19:50 2022 daemon.notice hostapd: ACS: Survey is missing noise floor
  *repeated ~50 times*
Sun Aug 28 14:19:50 2022 daemon.notice hostapd: wlan0: ACS-COMPLETED freq=2462 channel=11
Sun Aug 28 14:19:50 2022 daemon.notice hostapd: wlan0: interface state ACS->ENABLED
Sun Aug 28 14:19:50 2022 daemon.notice hostapd: wlan0: AP-ENABLED
Sun Aug 28 14:20:13 2022 daemon.info hostapd: wlan0: STA 80:19:34:ac:a1:cf IEEE 802.11: authenticated
Sun Aug 28 14:20:13 2022 daemon.info hostapd: wlan0: STA 80:19:34:ac:a1:cf IEEE 802.11: associated (aid 1)
Sun Aug 28 14:20:13 2022 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 80:19:34:ac:a1:cf
Sun Aug 28 14:20:13 2022 daemon.info hostapd: wlan0: STA 80:19:34:ac:a1:cf WPA: pairwise key handshake completed (RSN)
Sun Aug 28 14:20:13 2022 daemon.notice hostapd: wlan0: EAPOL-4WAY-HS-COMPLETED 80:19:34:ac:a1:cf
Sun Aug 28 14:21:08 2022 daemon.notice hostapd: handle_beacon - too short payload (len=26)
    *repeated ~20 times until 14:22:43*
Sun Aug 28 14:27:56 2022 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 80:19:34:ac:a1:cf
Sun Aug 28 14:27:56 2022 daemon.info hostapd: wlan0: STA 80:19:34:ac:a1:cf IEEE 802.11: disassociated due to inactivity
Sun Aug 28 14:27:57 2022 daemon.info hostapd: wlan0: STA 80:19:34:ac:a1:cf IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

80:19:34:ac:a1:cf is my laptop, which managed to connect once, but couldn't ping the gateway.
'iw list' shows the wireless interface as it should, but 'iw dev wlan0 scan' doesn't show anything.

Did you set your country code on the radio(s)? The missing noise floor messages are typically related to that.

Yes I have set the country code. The box has been running for 19 days without problems and without any settings change. This morning wifi was gone, and after a reboot and a full power cycle it didn't come back. I don't think that is a setting issue.
BTW, that 'missing noise floor' is not new, I've seen it before in the logs. I'm not so sure about 'handle_beacon - too short payload (len=26)'.

That looks like a more verbose warning introduced by newer hostapd releases. What does wifi status say?

# wifi status
{
	"radio0": {
		"up": true,
		"pending": false,
		"autostart": true,
		"disabled": false,
		"retry_setup_failed": false,
		"config": {
			"hwmode": "11g",
			"path": "pci0000:00/0000:00:0e.0",
			"txpower": 20,
			"country": "NL",
			"cell_density": 0,
			"htmode": "HT20",
			"channel": "5"
		},
		"interfaces": [
			{
				"section": "wifinet0",
				"ifname": "wlan0",
				"config": {
					"mode": "ap",
					"ssid": "XXX1",
					"encryption": "psk2",
					"key": "XXX",
					"disassoc_low_ack": false,
					"wds": true,
					"mode": "ap",
					"network": [
						"lan"
					]
				},
				"vlans": [
					
				],
				"stations": [
					
				]
			},
			{
				"section": "wifinet2",
				"ifname": "wlan0-1",
				"config": {
					"mode": "ap",
					"ssid": "XXX2",
					"encryption": "psk2",
					"wds": true,
					"disassoc_low_ack": false,
					"key": "XXX",
					"mode": "ap",
					"network": [
						"Guest"
					]
				},
				"vlans": [
					
				],
				"stations": [
					
				]
			}
		]
	}
}

That looks okay to me... And you can confirm rolling back to 21.02 works? Cause that might mean there's something in the wireless stack playing up.

1 Like

The convention for the hardware path to the wifi chip sometimes changes, meaning that keeping old settings from a previous version won't find the chip.

2 Likes

Mijzelf,

I've even more F1's on some I've seen weird Wifi issues.
Not yet tested 22.03.0-rc6 but going to this week (own builds).

Could this also be due to changing to DSA (Distributed Switch Architecture) on the network settings within OpenWrt? Wifi should be bridged to LAN some other way than normal:

As far as I know the Lantiq-family is not yet available with DSA, but maybe some settings -under the hood- are already adjusted and can cause weird Wifi issues?

DG.

@Borromini I haven't test it yet for this instance, but for the previous instance where I installed rc1, rolling back did not solve the problem.
@mk24 It worked for 19 days. And now I have put back my last device running rc5, which has the same settings, and which works.
@DGdodo DSA is implemented on 22.03 for Lantiq. I had to do some adjustments to get it running. Basically I deleted config/network and re-created it.
I haven't seen weird issues on any of them, except that the wifi chip is no longer detected on boot (since 19.xx?). A rescan of the PCI bus is needed, started in /etc/rc.local

So far I have
box1: Installed rc1, wifi dead, rolling back didn't revive wifi.
box2: Installed rc5, new settings for network, worked for some time (how long between rc5 and rc6?), and is now back in business.
box3: Installed rc6, settings copied from box2, after 19 days wifi died. Did not yet try to revert.

In general, it is possible but unlikely that the firmware is actually killing the wifi at a hardware level. The only ways that this would likely happen are:

  • loss of the factory wifi data (often the ART partition) as a result of being overwritten/erased or corrupted
  • operation out of spec (could be power/thermal, or in possibly with an invalid set of register values)

Most of the time, it is nearly impossible for firmware/software to damage the hardware in any other situations. Usually, hardware is built such that it firmware/software cannot drive the hardware much outside of the design margins. But it is not out of the realm of possibility, of course.

I'd be curious what happens with your wifi if you roll back to 21.02 (or even 19.07 if supported by your device) -- do not preserve settings.

If that doesn't work, what about returning to the stock firmware? Does that bring back wifi?

3 Likes

Mijzelf,

Maybe your Wifi issues have to do with your RT3062.eeprom file? If you use 3 x P2812HNU-F1's (in one network) you also need 3 different RT3062.eeprom files, as the Wifi mac address is coded in this file.
And hopefully you get all devices back up again. Otherwise PM me to exchange one? I've still about 7 of those F1's :wink:

DG.

@psherman I tried to flash 19.07, (had to use -F, as the boards name is different now) and it bricked. So I'll have to open it to connect a serial cable, which will take some more time. I'll come back when I succeeded.
Reverting to stock is not really an option. The stock 2 stage signed bootloaders are exchange by another one, and I didn't bother to backup them, years ago. The stock firmware was useless (to me) anyway.

@DGdodo They all have the same .eeprom file, but I'm not using them simultaneously. Only one is active, which is exchanged by a new one, when a new OpenWrt version arrives.
And thanks for the offer, but meanwhile I found 2 (almost) new boxes, which are now on their way to me. But I'll keep it in mind, and reckless flash 22.03 on them, as more replacements are available.

@psherman Got it running again. Saw that I made a mistake, I installed LEDE 17.01 instead of 19.07. To my surprice the wifi worked again. I put back the stock 22.03.0-rc6, and wifi still worked (after a 'echo 1 >/sys/bus/pci/rescan'). So I put back the custom generated blob (mainly +asterisk and +custom DSL blob) and put back the config archive. It still worked, and I have put back the box on duty. Will see what happens next.
It now has the same starting point as 20 days ago.

Make sure you're setting the country code in advanced settings under the device configuration - this one got me good. Also, if you're using WiFi 6 (ax) - you may be experiencing DFS.

I'm glad this is the case, but I'm not surprised. As I said earlier, it is unlikely that the hardware would be damaged by OpenWrt. To me, this points to one of two possible issues: misconfiguration or bad drivers. And of course, @m0dul8r mentioned DFS, which is another thing that could cause wifi to go down (this is true for 802.11ac / wifi 5 as well).

If you think the drivers may be to blame, the best way to prove that out is to perform a fresh install of 22.03-RCx (do not keep settings) and make only the most minimal changes required (i.e. set the SSID and password, then enable wifi; also ensure that the 5G band is not using a DFS channel). Don't make any other changes. If the problem recurs, you may have a driver issue that can be further investigated.

and the correct country setting for regulatory compliance, not just for legal adherence, but also to avoid (quite serious) issues with your clients (clients following IEEE 802.11d country IEs and getting messed up if your environment doesn't agree with itself and your clients' settings).

1 Like