After around 3 hours or so of uptime my Linksys WRT3200ACM seems to encounter CPU stalls, I've managed to capture this from the syslog.
Wed Dec 9 16:19:50 2020 kern.err kernel: [15372.663201] INFO: rcu_sched self-detected stall on CPU
Wed Dec 9 16:19:50 2020 kern.err kernel: [15372.668371] 1-...: (1 GPs behind) idle=09e/2/0 softirq=2024800/2024801 fqs=3000
Wed Dec 9 16:19:50 2020 kern.err kernel: [15372.673203] INFO: rcu_sched detected stalls on CPUs/tasks:
Wed Dec 9 16:19:50 2020 kern.err kernel: [15372.675884]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.681395] (t=6002 jiffies g=612372 c=612371 q=1985)
Wed Dec 9 16:19:50 2020 kern.err kernel: [15372.682978] 1-...: (1 GPs behind) idle=09e/2/0 softirq=2024800/2024801 fqs=3000
Wed Dec 9 16:19:50 2020 kern.err kernel: [15372.695644]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.695645] NMI backtrace for cpu 1
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.695648] (detected by 0, t=6002 jiffies, g=612372, c=612371, q=1985)
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.697229] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.209 #0
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.713483] Hardware name: Marvell Armada 380/385 (Device Tree)
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.719431] Function entered at [<c010ebf8>] from [<c010a8b8>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.725288] Function entered at [<c010a8b8>] from [<c0640834>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.731145] Function entered at [<c0640834>] from [<c0645fa8>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.737002] Function entered at [<c0645fa8>] from [<c0646034>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.742859] Function entered at [<c0646034>] from [<c0176620>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.748716] Function entered at [<c0176620>] from [<c01759b4>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.754572] Function entered at [<c01759b4>] from [<c0178c44>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.760429] Function entered at [<c0178c44>] from [<c0187fac>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.766285] Function entered at [<c0187fac>] from [<c0179294>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.772142] Function entered at [<c0179294>] from [<c0179b08>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.777998] Function entered at [<c0179b08>] from [<c010e2e8>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.783855] Function entered at [<c010e2e8>] from [<c016ab08>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.789712] Function entered at [<c016ab08>] from [<c0165d38>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.795568] Function entered at [<c0165d38>] from [<c0166298>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.801425] Function entered at [<c0166298>] from [<c0101464>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.807282] Function entered at [<c0101464>] from [<c010b54c>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.813138] Exception stack(0xdf4657d8 to 0xdf465820)
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.818211] 57c0: decd8800 d0da7d80
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.826425] 57e0: 00000000 00000000 d0da7d80 00000000 d1d89140 00000000 00000022 decd8800
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.834639] 5800: 00000022 c0902d00 fffffff4 df465828 c053b54c c053b118 20000113 ffffffff
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.842851] Function entered at [<c010b54c>] from [<c053b118>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15372.848708] Function entered at [<c053b118>] from [<00000000>]
Wed Dec 9 16:19:50 2020 kern.info kernel: [15372.854568] Sending NMI from CPU 0 to CPUs 1:
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859295] NMI backtrace for cpu 1
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859296] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.209 #0
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859297] Hardware name: Marvell Armada 380/385 (Device Tree)
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859298] task: df43f480 task.stack: df464000
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859299] pc : [<c053b118>] lr : [<c053b54c>] psr: 20000113
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859300] sp : df465828 ip : fffffff4 fp : c0902d00
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859301] r10: 00000022 r9 : decd8800 r8 : 00000022
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859302] r7 : 00000000 r6 : d1d89140 r5 : 00000000 r4 : d0da7d80
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859303] r3 : 00000000 r2 : 00000000 r1 : d0da7d80 r0 : decd8800
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859305] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859306] Control: 10c5387d Table: 1457804a DAC: 00000051
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859307] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.209 #0
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859308] Hardware name: Marvell Armada 380/385 (Device Tree)
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859309] Function entered at [<c010ebf8>] from [<c010a8b8>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859310] Function entered at [<c010a8b8>] from [<c0640834>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859311] Function entered at [<c0640834>] from [<c0645f90>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859312] Function entered at [<c0645f90>] from [<c010dab4>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859313] Function entered at [<c010dab4>] from [<c0101494>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859314] Function entered at [<c0101494>] from [<c010b54c>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859315] Exception stack(0xdf4657d8 to 0xdf465820)
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859317] 57c0: decd8800 d0da7d80
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859318] 57e0: 00000000 00000000 d0da7d80 00000000 d1d89140 00000000 00000022 decd8800
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859319] 5800: 00000022 c0902d00 fffffff4 df465828 c053b54c c053b118 20000113 ffffffff
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859320] Function entered at [<c010b54c>] from [<c053b118>]
Wed Dec 9 16:19:50 2020 kern.warn kernel: [15382.859321] Function entered at [<c053b118>] from [<00000000>]
The router locks up, doesn't respond to anything but then recovers and will start responding again and then the cycle will repeat for a while. I've caught the system load going really high as well (12.0).
I don't seem to get these on earlier 19.07 builds but do on 19.07.4 and 19.07.5, how can I debug a CPU stall to determine what's causing this?
I can't see anything major between the .4 and .5 release related to the mvebu target, that stands out.