Linksys WRT3200ACM CPU stalls on 19.07.4/.5

After around 3 hours or so of uptime my Linksys WRT3200ACM seems to encounter CPU stalls, I've managed to capture this from the syslog.

Wed Dec  9 16:19:50 2020 kern.err kernel: [15372.663201] INFO: rcu_sched self-detected stall on CPU
Wed Dec  9 16:19:50 2020 kern.err kernel: [15372.668371] 	1-...: (1 GPs behind) idle=09e/2/0 softirq=2024800/2024801 fqs=3000
Wed Dec  9 16:19:50 2020 kern.err kernel: [15372.673203] INFO: rcu_sched detected stalls on CPUs/tasks:
Wed Dec  9 16:19:50 2020 kern.err kernel: [15372.675884]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.681395]  (t=6002 jiffies g=612372 c=612371 q=1985)
Wed Dec  9 16:19:50 2020 kern.err kernel: [15372.682978] 	1-...: (1 GPs behind) idle=09e/2/0 softirq=2024800/2024801 fqs=3000
Wed Dec  9 16:19:50 2020 kern.err kernel: [15372.695644]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.695645] NMI backtrace for cpu 1
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.695648] (detected by 0, t=6002 jiffies, g=612372, c=612371, q=1985)
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.697229] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.209 #0
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.713483] Hardware name: Marvell Armada 380/385 (Device Tree)
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.719431] Function entered at [<c010ebf8>] from [<c010a8b8>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.725288] Function entered at [<c010a8b8>] from [<c0640834>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.731145] Function entered at [<c0640834>] from [<c0645fa8>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.737002] Function entered at [<c0645fa8>] from [<c0646034>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.742859] Function entered at [<c0646034>] from [<c0176620>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.748716] Function entered at [<c0176620>] from [<c01759b4>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.754572] Function entered at [<c01759b4>] from [<c0178c44>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.760429] Function entered at [<c0178c44>] from [<c0187fac>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.766285] Function entered at [<c0187fac>] from [<c0179294>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.772142] Function entered at [<c0179294>] from [<c0179b08>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.777998] Function entered at [<c0179b08>] from [<c010e2e8>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.783855] Function entered at [<c010e2e8>] from [<c016ab08>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.789712] Function entered at [<c016ab08>] from [<c0165d38>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.795568] Function entered at [<c0165d38>] from [<c0166298>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.801425] Function entered at [<c0166298>] from [<c0101464>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.807282] Function entered at [<c0101464>] from [<c010b54c>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.813138] Exception stack(0xdf4657d8 to 0xdf465820)
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.818211] 57c0:                                                       decd8800 d0da7d80
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.826425] 57e0: 00000000 00000000 d0da7d80 00000000 d1d89140 00000000 00000022 decd8800
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.834639] 5800: 00000022 c0902d00 fffffff4 df465828 c053b54c c053b118 20000113 ffffffff
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.842851] Function entered at [<c010b54c>] from [<c053b118>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15372.848708] Function entered at [<c053b118>] from [<00000000>]
Wed Dec  9 16:19:50 2020 kern.info kernel: [15372.854568] Sending NMI from CPU 0 to CPUs 1:
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859295] NMI backtrace for cpu 1
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859296] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.209 #0
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859297] Hardware name: Marvell Armada 380/385 (Device Tree)
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859298] task: df43f480 task.stack: df464000
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859299] pc : [<c053b118>]    lr : [<c053b54c>]    psr: 20000113
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859300] sp : df465828  ip : fffffff4  fp : c0902d00
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859301] r10: 00000022  r9 : decd8800  r8 : 00000022
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859302] r7 : 00000000  r6 : d1d89140  r5 : 00000000  r4 : d0da7d80
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859303] r3 : 00000000  r2 : 00000000  r1 : d0da7d80  r0 : decd8800
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859305] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859306] Control: 10c5387d  Table: 1457804a  DAC: 00000051
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859307] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.209 #0
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859308] Hardware name: Marvell Armada 380/385 (Device Tree)
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859309] Function entered at [<c010ebf8>] from [<c010a8b8>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859310] Function entered at [<c010a8b8>] from [<c0640834>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859311] Function entered at [<c0640834>] from [<c0645f90>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859312] Function entered at [<c0645f90>] from [<c010dab4>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859313] Function entered at [<c010dab4>] from [<c0101494>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859314] Function entered at [<c0101494>] from [<c010b54c>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859315] Exception stack(0xdf4657d8 to 0xdf465820)
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859317] 57c0:                                                       decd8800 d0da7d80
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859318] 57e0: 00000000 00000000 d0da7d80 00000000 d1d89140 00000000 00000022 decd8800
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859319] 5800: 00000022 c0902d00 fffffff4 df465828 c053b54c c053b118 20000113 ffffffff
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859320] Function entered at [<c010b54c>] from [<c053b118>]
Wed Dec  9 16:19:50 2020 kern.warn kernel: [15382.859321] Function entered at [<c053b118>] from [<00000000>]

The router locks up, doesn't respond to anything but then recovers and will start responding again and then the cycle will repeat for a while. I've caught the system load going really high as well (12.0).

I don't seem to get these on earlier 19.07 builds but do on 19.07.4 and 19.07.5, how can I debug a CPU stall to determine what's causing this?

I can't see anything major between the .4 and .5 release related to the mvebu target, that stands out.

Run 'top', wait for the issue to happen again, and see if there are processes eating all the CPU cycles.

Thank you. Hasn't happened since, but I did catch another backtrace with further clues:

Sat Dec 12 07:51:26 2020 kern.warn kernel: [244155.844973] ------------[ cut here ]------------
Sat Dec 12 07:51:26 2020 kern.warn kernel: [244155.849753] WARNING: CPU: 0 PID: 0 at backports-4.19.137-1/net/mac80211/rx.c:4559 0xbf205778 [mac80211@bf1e2000+0x62000]
Sat Dec 12 07:51:26 2020 kern.warn kernel: [244155.860765] Rate marked as a VHT rate but data is invalid: MCS: 12, NSS: 2
Sat Dec 12 07:51:26 2020 kern.warn kernel: [244155.867766] Modules linked in: pppoe ppp_async l2tp_ppp pppox ppp_generic iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY wireguard slhc rfcomm nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_netlink mwifiex_sdio mwifiex macvlan iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hidp hci_uart crc_ccitt btusb btmrvl_sdio btmrvl btintel bnep bluetooth sch_cake
Sat Dec 12 07:51:26 2020 kern.warn kernel: [244155.939136]  sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred hid evdev input_core mwlwifi mac80211 cfg80211 compat xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6t_NPT ip6t_MASQUERADE nf_nat_masquerade_ipv6 nf_nat nf_conntrack nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb nat46 l2tp_ip6 l2tp_ip l2tp_eth sit l2tp_netlink l2tp_core udp_tunnel
Sat Dec 12 07:51:26 2020 kern.warn kernel: [244156.010761]  ip6_udp_tunnel tunnel4 ip_tunnel tun ecdh_generic kpp ecb cmac uhci_hcd ohci_platform ohci_hcd gpio_button_hotplug
Sat Dec 12 07:51:26 2020 kern.warn kernel: [244156.022397] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.209 #0
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.028602] Hardware name: Marvell Armada 380/385 (Device Tree)
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.034637] Function entered at [<c010ebf8>] from [<c010a8b8>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.040581] Function entered at [<c010a8b8>] from [<c0640834>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.046525] Function entered at [<c0640834>] from [<c01286d4>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.052470] Function entered at [<c01286d4>] from [<c0128728>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.058413] Function entered at [<c0128728>] from [<bf205778>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.064379] Function entered at [<bf205778>] from [<bf25df24>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.070329] Function entered at [<bf25df24>] from [<c012cd64>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.076272] Function entered at [<c012cd64>] from [<c0101628>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.082217] Function entered at [<c0101628>] from [<c012d36c>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.088160] Function entered at [<c012d36c>] from [<c016629c>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.094104] Function entered at [<c016629c>] from [<c0101464>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.100047] Function entered at [<c0101464>] from [<c010b54c>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.105991] Exception stack(0xc0901f38 to 0xc0901f80)
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.111150] 1f20:                                                       00000000 a92012b2
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.119451] 1f40: 1f38e000 00000000 c08452c0 60000013 00000000 00000000 00000000 00000000
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.127753] 1f60: 00000000 00000000 1f38e000 c0901f88 c0175428 c017543c 60000013 ffffffff
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.136053] Function entered at [<c010b54c>] from [<c017543c>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.141997] Function entered at [<c017543c>] from [<c015d054>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.147942] Function entered at [<c015d054>] from [<c015d2e0>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.153886] Function entered at [<c015d2e0>] from [<c0800cbc>]
Sat Dec 12 07:51:27 2020 kern.warn kernel: [244156.159847] ---[ end trace aa8fc1172243dab4 ]---

This could be a clue:

What's in your "/etc/config/wireless"?

config wifi-device 'radio0'
        option type 'mac80211'
        option hwmode '11a'
        option path 'soc/soc:pcie/pci0000:00/0000:00:01.0/0000:01:00.0'
        option htmode 'VHT80'
        option country 'GB'
        option channel '36'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option encryption 'psk2+ccmp'
        option key '***REDACTED***'
        option ssid '***REDACTED***'

config wifi-device 'radio1'
        option type 'mac80211'
        option hwmode '11g'
        option path 'soc/soc:pcie/pci0000:00/0000:00:02.0/0000:02:00.0'
        option htmode 'HT20'
        option channel 'auto'
        option country 'GB'

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option encryption 'psk2+ccmp'
        option key '***REDACTED***'
        option ssid '***REDACTED***'

config wifi-device 'radio2'
        option type 'mac80211'
        option channel '36'
        option hwmode '11a'
        option path 'platform/soc/soc:internal-regs/f10d8000.sdhci/mmc_host/mmc0/mmc0:0001/mmc0:0001:1'
        option htmode 'VHT80'
        option disabled '1'

config wifi-iface 'wifinet0'
        option device 'radio1'
        option mode 'ap'
        option network 'guest'
        option encryption 'psk2+ccmp'
        option ssid '***REDACTED***'
        option key '***REDACTED***'