Xiaomi Mi4A Giga (dumb AP) crash

I'm using this router configured as dumb AP, connected to switch. Stock Openwrt install, updated regularly, currently on 22.03.5. A couple of 5Gz clients and some smarthome stuff on 2.4GHz. It was working rock stable for 7 months and then 2 total crashes/freezes one month apart. Router unresponsive, drops all wlan and is also not accessible over LAN port. Starts fine when power cycled. After first crash I installed collectd to monitor the router and also redirected log to local syslog server on my home server.
Second time it crashed (a week ago) it managed to send this to syslog server.
Any ideas about that?
I tried hammering the AP with iperf3 from a LAN computer + wlan speed test with some clients to local server and no problem. AP Rock stable :slight_smile:
This looks suspicious: BUG: Bad page state in process napi/phy0-7 pfn:038d0
MAC from wlan client is redacted, it was some tp-link usb stick in a computer 2 rooms further.

May  9 19:01:46 192.168.88.3 hostapd: wlan1: AP-STA-DISCONNECTED AA:BB:CC:DD:EE:FF
May  9 19:01:46 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: disassociated
May  9 19:01:46 192.168.88.3 hostapd: wlan1: STA-OPMODE-N_SS-CHANGED AA:BB:CC:DD:EE:FF 1
May  9 19:01:46 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: authenticated
May  9 19:01:46 192.168.88.3 hostapd: wlan1: STA-OPMODE-N_SS-CHANGED AA:BB:CC:DD:EE:FF 2
May  9 19:01:46 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: associated (aid 4)
May  9 19:01:47 192.168.88.3 hostapd: wlan1: AP-STA-CONNECTED AA:BB:CC:DD:EE:FF
May  9 19:01:47 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF WPA: pairwise key handshake completed (RSN)
May  9 19:01:47 192.168.88.3 hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED AA:BB:CC:DD:EE:FF
May  9 19:02:02 192.168.88.3 hostapd: wlan1: AP-STA-DISCONNECTED AA:BB:CC:DD:EE:FF
May  9 19:02:03 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: disassociated
May  9 19:02:03 192.168.88.3 kernel: [438430.934595] BUG: Bad page state in process napi/phy0-7  pfn:038d0
May  9 19:02:03 192.168.88.3 kernel: [438430.940786] page:b6af38ab refcount:-1 mapcount:0 mapping:00000000 index:0x0 pfn:0x38d0
May  9 19:02:03 192.168.88.3 kernel: [438430.948754] flags: 0x0()
May  9 19:02:03 192.168.88.3 kernel: [438430.951373] raw: 00000000 00000100 00000122 00000000 00000000 00000000 ffffffff ffffffff
May  9 19:02:03 192.168.88.3 kernel: [438430.959512] raw: 00000000
May  9 19:02:03 192.168.88.3 kernel: [438430.962203] page dumped because: nonzero _refcount
May  9 19:02:03 192.168.88.3 kernel: [438430.967053] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt76x2e mt76x2_common mt76x02_lib mt7603e mt76 mac80211 cfg80211 slhc nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_ipv6 nf_log_ipv4 nf_log_common nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc_ccitt compat sha256_generic libsha256 seqiv jitterentropy_rng drbg hmac cmac leds_gpio gpio_button_hotplug crc32c_generic
May  9 19:02:03 192.168.88.3 kernel: [438431.030177] CPU: 2 PID: 809 Comm: napi/phy0-7 Not tainted 5.10.176 #0
May  9 19:02:03 192.168.88.3 kernel: [438431.036677] Stack : 00000000 8072f628 809b0000 00000003 00000000 00000000 00000000 00000000
May  9 19:02:03 192.168.88.3 kernel: [438431.045112]         00000000 00000000 00000000 00000000 00000000 00000001 8252bb48 6a6bb3df
May  9 19:02:03 192.168.88.3 kernel: [438431.053544]         8252bbe0 00000000 00000000 8252b9f0 00000038 8039e8a4 ffffffea 00000000
May  9 19:02:03 192.168.88.3 kernel: [438431.061979]         8252b9fc 00000140 807d4940 ffffffff 8252bb28 806fe1e4 00000000 80700000
May  9 19:02:03 192.168.88.3 kernel: [438431.070413]         807d0000 809c0000 80855ce0 00000003 00000000 8040948c 00000008 80990008
May  9 19:02:03 192.168.88.3 kernel: [438431.078846]         ...
May  9 19:02:03 192.168.88.3 kernel: [438431.081371] Call Trace:
May  9 19:02:03 192.168.88.3 kernel: [438431.081413] [<8039e8a4>] 0x8039e8a4
May  9 19:02:03 192.168.88.3 kernel: [438431.087490] [<8040948c>] 0x8040948c
May  9 19:02:03 192.168.88.3 kernel: [438431.091050] [<80007b08>] 0x80007b08
May  9 19:02:03 192.168.88.3 kernel: [438431.094610] [<80007b10>] 0x80007b10
May  9 19:02:03 192.168.88.3 kernel: [438431.098168] [<803839ac>] 0x803839ac
May  9 19:02:03 192.168.88.3 kernel: [438431.101727] [<8017c7c8>] 0x8017c7c8
May  9 19:02:03 192.168.88.3 kernel: [438431.105292] [<8018030c>] 0x8018030c
May  9 19:02:03 192.168.88.3 kernel: [438431.108862] [<80181108>] 0x80181108
May  9 19:02:03 192.168.88.3 kernel: [438431.112438] [<82731904>] 0x82731904 [mac80211@baa1f021+0x81460]
May  9 19:02:03 192.168.88.3 kernel: [438431.118429] [<82663160>] 0x82663160 [mt76@9b9d37f5+0xb040]
May  9 19:02:03 192.168.88.3 kernel: [438431.123986] [<804eddc0>] 0x804eddc0
May  9 19:02:03 192.168.88.3 kernel: [438431.127550] [<80181f3c>] 0x80181f3c
May  9 19:02:03 192.168.88.3 kernel: [438431.131124] [<8266147c>] 0x8266147c [mt76@9b9d37f5+0xb040]
May  9 19:02:03 192.168.88.3 kernel: [438431.136693] [<82653a2c>] 0x82653a2c [mt7603e@9c8f0fe6+0x96a0]
May  9 19:02:03 192.168.88.3 kernel: [438431.142517] [<82661af4>] 0x82661af4 [mt76@9b9d37f5+0xb040]
May  9 19:02:03 192.168.88.3 kernel: [438431.148067] [<80060c34>] 0x80060c34
May  9 19:02:03 192.168.88.3 kernel: [438431.151636] [<80510b80>] 0x80510b80
May  9 19:02:03 192.168.88.3 kernel: [438431.155198] [<806e1ac0>] 0x806e1ac0
May  9 19:02:03 192.168.88.3 kernel: [438431.158757] [<80510db0>] 0x80510db0
May  9 19:02:03 192.168.88.3 kernel: [438431.162317] [<80510e00>] 0x80510e00
May  9 19:02:03 192.168.88.3 kernel: [438431.165878] [<80050ed4>] 0x80050ed4
May  9 19:02:03 192.168.88.3 kernel: [438431.169433] [<806e1f4c>] 0x806e1f4c
May  9 19:02:03 192.168.88.3 kernel: [438431.172999] [<80510ca0>] 0x80510ca0
May  9 19:02:03 192.168.88.3 kernel: [438431.176556] [<8005113c>] 0x8005113c
May  9 19:02:03 192.168.88.3 kernel: [438431.180111] [<80051000>] 0x80051000
May  9 19:02:03 192.168.88.3 kernel: [438431.183677] [<80051000>] 0x80051000
May  9 19:02:03 192.168.88.3 kernel: [438431.187238] [<80051000>] 0x80051000
May  9 19:02:03 192.168.88.3 kernel: [438431.190793] [<80002f38>] 0x80002f38
May  9 19:02:03 192.168.88.3 kernel: [438431.194358]
May  9 19:02:03 192.168.88.3 kernel: [438431.195928] Disabling lock debugging due to kernel taint
May  9 19:02:04 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
May  9 19:02:06 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: authenticated
May  9 19:02:06 192.168.88.3 hostapd: wlan1: STA-OPMODE-N_SS-CHANGED AA:BB:CC:DD:EE:FF 2
May  9 19:02:06 192.168.88.3 hostapd: wlan1: STA AA:BB:CC:DD:EE:FF IEEE 802.11: associated (aid 4)
May  9 19:02:25 192.168.88.3 kernel: [438452.651590] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May  9 19:02:25 192.168.88.3 kernel: [438452.657613] 	(detected by 1, t=2102 jiffies, g=12097061, q=34)
May  9 19:02:25 192.168.88.3 kernel: [438452.663515] rcu: All QSes seen, last rcu_sched kthread activity 2103 (43816591-43814488), jiffies_till_next_fqs=1, root ->qsmask 0x0
May  9 19:02:25 192.168.88.3 kernel: [438452.675470] rcu: rcu_sched kthread starved for 2104 jiffies! g12097061 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
May  9 19:02:25 192.168.88.3 kernel: [438452.685858] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
May  9 19:02:25 192.168.88.3 kernel: [438452.694863] rcu: RCU grace-period kthread stack dump:
May  9 19:02:25 192.168.88.3 kernel: [438452.699982] task:rcu_sched       state:R  running task     stack:    0 pid:   11 ppid:     2 flags:0x00100000
May  9 19:02:25 192.168.88.3 kernel: [438452.709961] Stack : 807d0000 807d0000 807cc414 800a2810 80c3bf68 807d0000 807d0000 80c3bc00
May  9 19:02:25 192.168.88.3 kernel: [438452.718422]         807d0000 00000001 00000001 80838ed0 807d0000 807d0000 807cc414 806e1ac0
May  9 19:02:25 192.168.88.3 kernel: [438452.726881]         80838ed0 807d0000 807d0000 807cc414 029c8e59 806e4ae4 81007420 00000001
May  9 19:02:25 192.168.88.3 kernel: [438452.735339]         807d0000 00000000 00000000 81004540 029c8e59 800a1a44 06800000 80c3bc00
May  9 19:02:25 192.168.88.3 kernel: [438452.743798]         80838da0 80838da0 00000000 8009baf0 80c61e94 8006b428 8088bcc0 81006cc0
May  9 19:02:25 192.168.88.3 kernel: [438452.752260]         ...
May  9 19:02:25 192.168.88.3 kernel: [438452.754792] Call Trace:
May  9 19:02:25 192.168.88.3 kernel: [438452.754812] [<800a2810>] 0x800a2810
May  9 19:02:25 192.168.88.3 kernel: [438452.760893] [<806e1ac0>] 0x806e1ac0
May  9 19:02:25 192.168.88.3 kernel: [438452.764458] [<806e4ae4>] 0x806e4ae4
May  9 19:02:25 192.168.88.3 kernel: [438452.768056] [<800a1a44>] 0x800a1a44
May  9 19:02:25 192.168.88.3 kernel: [438452.771621] [<8009baf0>] 0x8009baf0
May  9 19:02:25 192.168.88.3 kernel: [438452.775185] [<8006b428>] 0x8006b428
May  9 19:02:25 192.168.88.3 kernel: [438452.778754] [<80090000>] 0x80090000
May  9 19:02:25 192.168.88.3 kernel: [438452.782316] [<800a0000>] 0x800a0000
May  9 19:02:25 192.168.88.3 kernel: [438452.785886] [<8009b0a0>] 0x8009b0a0
May  9 19:02:25 192.168.88.3 kernel: [438452.789450] [<8005113c>] 0x8005113c
May  9 19:02:25 192.168.88.3 kernel: [438452.793015] [<80051000>] 0x80051000
May  9 19:02:25 192.168.88.3 kernel: [438452.796583] [<80051000>] 0x80051000
May  9 19:02:25 192.168.88.3 kernel: [438452.800149] [<80051000>] 0x80051000
May  9 19:02:25 192.168.88.3 kernel: [438452.803712] [<80002f38>] 0x80002f38
May  9 19:02:25 192.168.88.3 kernel: [438452.807291]

1 Like

No guarantees but from my past experience my guess is you have a hardware problem.

It could be:

  1. Overheating
  2. Internal component degradation - usually electrolytic capacitors failing.
  3. Power supply slowly failing and unable to maintain correct voltage, particularly when wireless is busy.

If it is not getting hot, try swapping out the power supply.

If you can eliminate 1 and 3, then consider buying a new AP unless you are confident with a soldering iron

1 Like

Thank you.
For a start I'll try another power supply.
Caps are the first thing that I checked and they look good, not bloated or leaking. Router is not that old and it doesn't even heat much. Although probably some chinese low end, electrolytic caps are 105°C rated. Maybe I'll try recaping.
AP survived one afternoon at 100% CPU load and wireless as fast as I could throw at it.
Also it is connected to UPS and all other equipment doesn't show any problems: router, ONT, switch,...
Power brick has the correct voltage, rest of the parameters I don't have the equipment to test.

Since you have a call trace, i would post it on github. Who knows, maybe somebody with knowledge will have a look at it.