Linksys EA3500/EA4500 suffers endless reboots: Radio1 (5GHz) crashes in mac80211

One of 5 EA3500's, each at a different site, keeps crashing and restarting. Both v18.06.2 and Snapshot r9614-b614954 are suffering it at this particular site. I could finally keep it booting up after disabling radio1 (5 GHz). Other 4 sites have no trouble with the same system and wireless configuration.

mwlwifi version for OpenWrt v18.06.2: 2018-11-14, 81413aa9855825541a0034b3ff497c2b7d59be5b
Snapshot r9614-b614954: 2018-12-18, c2c8244d8fea5d59762cb14438ded00bf6d5965c

As soon as I do "wifi up" after setting radio1's option disabled 0 in /etc/config/wireless, the system crashes and its ssh session is gone. How to get a copy of system/kernel logs? A kernel dump will be even better.

By the way, I can boot up my all EA3500's natively or from a USB drive. I had hoped the latter would save something in the /sys directory, but mounting it on Ubuntu after a crash, the /sys was empty.

Thanks.

The "easy" approach is to open an SSH session and run logread -f, preserving it in the terminal application's scroll-back buffer.

You can also have the in-built logger write to a file (https://openwrt.org/docs/guide-user/base-system/system_configuration) by adding

	option log_file '/var/log/syslog'

or the like to the system section.

It doesn't capture what's already in the log buffer, so my /etc/rc.local begins with

#!/bin/sh

dmesg > /var/log/dmesg.boot

(You'll want something on your persistent drive, as /var/ is typically memory-backed.)

1 Like

Thank you, Jeff, for multiple ideas.

I first tried run logread -f in a different ssh window. It didn't show anything more after initiating wifi up in the primary ssh window.

Both option log_file '/home/log/syslog' and dmesg > /home/log/dmesg.boot, with /home being mounted on the USB drive, showed a bit more:

Thu Mar 21 17:50:17 2019 kern.info kernel: [   36.775036] device wlan0 entered promiscuous mode
Thu Mar 21 17:50:17 2019 kern.info kernel: [   36.785424] br-lan: port 3(wlan1) entered blocking state
Thu Mar 21 17:50:17 2019 kern.info kernel: [   36.790818] br-lan: port 3(wlan1) entered disabled state
Thu Mar 21 17:50:17 2019 kern.info kernel: [   36.796500] device wlan1 entered promiscuous mode
Thu Mar 21 17:50:17 2019 daemon.notice hostapd: wlan0: interface state UNINITIALIZED->COUNTRY_UPDATE
Thu Mar 21 17:50:17 2019 daemon.notice hostapd: ACS: Automatic channel selection started, this may take a bit
Thu Mar 21 17:50:17 2019 daemon.notice hostapd: wlan0: interface state COUNTRY_UPDATE->ACS

That is it. Still very stingy about the subsequent crash. Any other ideas to capture the kernel crash?

By the way, the ext2 fs on the USB drive was messed up a little and I had to perform a fsck:

root@ubuntu:~# fsck /dev/sdc1
fsck from util-linux 2.31.1
e2fsck 1.44.1 (24-Mar-2018)
ea3500-yellow was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  +3409933
Fix<y>? yes
Free blocks count wrong for group #104 (32230, counted=32229).
Fix<y>? yes
Free blocks count wrong (3115131, counted=3115130).
Fix<y>? yes

ea3500: ***** FILE SYSTEM WAS MODIFIED *****
ea3500: 49937/983040 files (10.0% non-contiguous), 816774/3931904 blocks

It probably tried, but couldn't write out a crash log.

Running Linksys stock firmware (v1.1.40.162464) doesn't suffer the crash at this particular site and others. Without soldering, not my forte, how to get a crash log from OpenWrt 18.06.2 and snapshot r9614-b614954 on an EA3500?

Appreciate any hints :hammer_and_wrench:

I can pretty much rule out hardware problems. Just boot up Snapshot r9614-b614954 on an EA4500, whose radio1 (5G) worked on another site. It suffers the same infinite crash/restart sequence.

At least 3 different devices worked at different locations, but crash only at this one. What is magical around this place that can cause crashes? :woozy_face:

To submit a bug, I'd like to get at least a crash log... How?

/sys/kernel/debug/crashlog

Appreciate the pointer. That was the first place I looked. But the entire directory /sys is empty when examining the USB drive on a desktop. I guess /sys is only available during runtime.

How to enable /sys/kernel/debug/crashlog?

LGA1150's one-liner led me to search for "crashlog" and found Crashlog retrieval (MIPS) - #2 by jow. On an EA3500/EA4500, booting up natively and off a USB drive, I was lucky enough to be able

But, I was not so lucky when it came to the radio1 (5GHz) crash. I hacked the /sbin/wifi scritpt, making sure radio1 was disabled after one invocation of wifi up, to avoid infinite restarts. Upon one crash and one restart, /sys/kernel/debug/crashlog was not there! Tried both native and USB boot-ups several times :frowning_face:

Forget about my forte or not. Just do it. I bust-opened an EA4500 and put in a couple alloy peaks, aka terribly looking TX and RX pins. From the serial console:

[   22.960919] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   23.010664] br-lan: port 1(eth0.1) entered blocking state
[   23.016172] br-lan: port 1(eth0.1) entered disabled state
[   23.021848] device eth0.1 entered promiscuous mode
[   23.026687] device eth0 entered promiscuous mode
[   23.110502] br-lan: port 1(eth0.1) entered blocking state
[   23.115976] br-lan: port 1(eth0.1) entered forwarding state
[   23.121730] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
[   23.191580] mv643xx_eth_port mv643xx_eth_port.1 eth1: link up, 1000 Mb/s, full duplex, flow control disabled
[   23.202650] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   24.009012] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   24.015903] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   24.022546] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   25.052979] 0000:01:00.0: unable to load firmware helper image
[   25.058872] ieee80211 phy0: Cannot start firmware
[   25.063675] ieee80211 phy0: Trying to reload the firmware again
[   25.987785] ieee80211 phy0: 88w8366 v7, 20aa4b891bf2, AP firmware 5.2.8.17
[   26.047574] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   26.113947] 0000:02:00.0: unable to load firmware helper image
[   26.119821] ieee80211 phy1: Cannot start firmware
[   26.124636] ieee80211 phy1: Trying to reload the firmware again
[   26.511780] ieee80211 phy1: 88w8366 v7, 20aa4b891bf4, AP firmware 5.2.8.17
[   26.565994] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[   26.577344] br-lan: port 2(wlan0) entered blocking state
[   26.582745] br-lan: port 2(wlan0) entered disabled state
[   26.588309] device wlan0 entered promiscuous mode
[   26.614795] br-lan: port 3(wlan1) entered blocking state
[   26.620186] br-lan: port 3(wlan1) entered disabled state
[   26.625800] device wlan1 entered promiscuous mode
[   27.137261] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   27.143826] br-lan: port 2(wlan0) entered blocking state
[   27.149170] br-lan: port 2(wlan0) entered forwarding state
[   27.179154] ------------[ cut here ]------------
[   27.183961] WARNING: CPU: 0 PID: 0 at backports-4.19.23-1/net/mac80211/rx.c:4516 ieee80211_rx_napi+0x1fc/0xa54 [mac80211]
[   27.194991] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mwl8k mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 tun gpio_button_hotplug
[   27.254721] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.105 #0
[   27.260669] Hardware name: Marvell Kirkwood (Flattened Device Tree)
[   27.266969] Backtrace: 
[   27.269438] [<c010530c>] (dump_backtrace) from [<c01055f4>] (show_stack+0x18/0x1c)
[   27.277057]  r7:bf1f012c r6:00000000 r5:bf227600 r4:00000000
[   27.282772] [<c01055dc>] (show_stack) from [<c058b880>] (dump_stack+0x20/0x28)
[   27.290036] [<c058b860>] (dump_stack) from [<c01196b0>] (__warn+0xdc/0x108)
[   27.297048] [<c01195d4>] (__warn) from [<c0119794>] (warn_slowpath_null+0x28/0x30)
[   27.304665]  r9:c7104ff0 r8:bf226390 r7:bf226318 r6:c71058a0 r5:c7104bc0 r4:c72be9c0
[   27.312535] [<c011976c>] (warn_slowpath_null) from [<bf1f012c>] (ieee80211_rx_napi+0x1fc/0xa54 [mac80211])
[   27.322376] [<bf1eff30>] (ieee80211_rx_napi [mac80211]) from [<bf1cb68c>] (ieee80211_tasklet_handler+0x64/0xc0 [mac80211])
[   27.333486]  r10:00000000 r9:c7104ff0 r8:bf226390 r7:bf226318 r6:c7104bc0 r5:c7104fe4
[   27.341359]  r4:c72be9c0
[   27.343976] [<bf1cb628>] (ieee80211_tasklet_handler [mac80211]) from [<c011c5dc>] (tasklet_action+0x88/0xd8)
[   27.353865]  r9:00000100 r8:ffffe000 r7:00000000 r6:c082a000 r5:c08085fc r4:00000000
[   27.361664] [<c011c554>] (tasklet_action) from [<c0101464>] (__do_softirq+0xac/0x25c)
[   27.369534]  r7:c082a020 r6:40000006 r5:c082a038 r4:00000006
[   27.375236] [<c01013b8>] (__do_softirq) from [<c011c940>] (irq_exit+0xc8/0x110)
[   27.382594]  r10:00000000 r9:c0801f00 r8:c7805200 r7:00000001 r6:00000000 r5:c0829178
[   27.390458]  r4:00000000
[   27.393022] [<c011c878>] (irq_exit) from [<c01465e4>] (__handle_domain_irq+0x8c/0xa8)
[   27.400893] [<c0146558>] (__handle_domain_irq) from [<c010138c>] (orion_handle_irq+0x74/0xa0)
[   27.409469]  r9:c0801f00 r8:00000001 r7:c085564c r6:c780801c r5:00000400 r4:0000000a
[   27.417263] [<c0101318>] (orion_handle_irq) from [<c01060c8>] (__irq_svc+0x68/0x84)
[   27.424968] Exception stack(0xc0801f00 to 0xc0801f48)
[   27.430044] 1f00: 00000000 00000000 00000000 60000013 00000000 ffffe000 c0803094 c08206e0
[   27.438271] 1f20: c080a852 c065e3e0 00000000 c0801f5c c0801f50 c0801f50 c0102ec0 c05a6814
[   27.446494] 1f40: 60000013 ffffffff
[   27.450004]  r10:00000000 r9:c0800000 r8:c080a852 r7:c0801f34 r6:ffffffff r5:60000013
[   27.457875]  r4:c05a6814
[   27.460441] [<c05a67dc>] (default_idle_call) from [<c013fdb8>] (do_idle+0x84/0x144)
[   27.468154] [<c013fd34>] (do_idle) from [<c01400d8>] (cpu_startup_entry+0x14/0x18)
[   27.475774]  r10:00721c14 r9:c0723a20 r8:c7ffcc80 r7:00000000 r6:c0803020 r5:ffffffff
[   27.483654]  r4:c0809c4c r3:40000013
[   27.487258] [<c01400c4>] (cpu_startup_entry) from [<c05a1660>] (rest_init+0x74/0x94)
[   27.495063] [<c05a15ec>] (rest_init) from [<c0700d3c>] (start_kernel+0x3a0/0x424)
[   27.502595]  r5:ffffffff r4:c08299a0
[   27.506188] [<c070099c>] (start_kernel) from [<00008048>] (0x8048)
[   27.512404] ---[ end trace 351d5fe97981d9fa ]---

A nasty crash at backports-4.19.23-1/net/mac80211/rx.c:4516 ieee80211_rx_napi+0x1fc/0xa54 [mac80211]. A bug in the driver?

A post was split to a new topic: Kr00k vulnerability (CVE-2019-15126)