Why did probability appear in the startup process of 11nas router "INFO: rcu_sched self-detected stall on CPU"?

Why did probability appear in the startup process of 11nas router "INFO: rcu_sched self-detected stall on CPU"?The system will then print the exception information continuously.The following is a log:

[    2.628942] 4 fixed-partitions partitions found on MTD device spi0.0
[    2.641645] Creating 4 MTD partitions on "spi0.0":
[    2.651298] 0x000000000000-0x000000030000 : "u-boot"
[    2.663981] 0x000000030000-0x000000040000 : "u-boot-env"
[    2.676652] 0x000000040000-0x000000050000 : "factory"
[    2.688763] 0x000000050000-0x000001000000 : "firmware"
[    2.704021] random: fast init done
[    2.797203] 2 uimage-fw partitions found on MTD device firmware
[    2.809074] 0x000000050000-0x000000241f42 : "kernel"
[    2.821116] 0x000000241f42-0x000001000000 : "rootfs"
[    2.833112] mtd: device 5 (rootfs) set to be root filesystem
[    2.844611] 1 squashfs-split partitions found on MTD device rootfs
[    2.856992] 0x000000450000-0x000001000000 : "rootfs_data"
[    2.871579] libphy: Fixed MDIO Bus: probed
[    4.921693] random: crng init done
[   62.882643] INFO: rcu_sched self-detected stall on CPU
[   62.892884]  0-...: (5999 ticks this GP) idle=2ba/140000000000002/0 softirq=84/84 fqs=2825 
[   62.909489]   (t=6000 jiffies g=-279 c=-280 q=74)
[   62.918861] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.54 #0
[   62.930804] Stack : 00000000 00000000 00000000 00000000 80b7be92 00000034 00000000 00000000
[   62.947476]         8fc3845c 8061df87 8059a9f0 00000000 00000001 00000001 8fc0dbc8 532616d7
[   62.964147]         00000000 00000000 80b80000 8fc0dac4 86edf5c4 0000007f 00000007 00000000
[   62.980818]         00000000 80620000 000e054d 00000000 00000000 00000000 80650000 80481974
[   62.997489]         80625f20 8061b774 000001e0 80620000 00000003 8029cc70 00000000 80690000
[   63.014162]         ...
[   63.019032] Call Trace:
[   63.023915] [<80010598>] show_stack+0x58/0x100
[   63.032765] [<8045b46c>] dump_stack+0x9c/0xe0
[   63.041427] [<8000da20>] arch_trigger_cpumask_backtrace+0x50/0x78
[   63.053573] [<800985a8>] rcu_dump_cpu_stacks+0xc4/0x13c
[   63.063966] [<80097760>] rcu_check_callbacks+0x310/0x814
[   63.074540] [<8009b244>] update_process_times+0x34/0x70
[   63.084944] [<800acecc>] tick_handle_periodic+0x34/0xd0
[   63.095358] [<8031d200>] gic_compare_interrupt+0x70/0x80
[   63.105945] [<8008ad44>] handle_percpu_devid_irq+0xbc/0x1a8
[   63.117028] [<800851e0>] generic_handle_irq+0x40/0x58
[   63.127093] [<802595fc>] gic_handle_local_int+0x9c/0xe8
[   63.137487] [<802597bc>] gic_irq_dispatch+0x10/0x20
[   63.147188] [<800851e0>] generic_handle_irq+0x40/0x58
[   63.157251] [<80479e7c>] do_IRQ+0x1c/0x2c
[   63.165225] [<8025878c>] plat_irq_dispatch+0xfc/0x138
[   63.175273] [<8000b5e8>] except_vec_vi_end+0xb8/0xc4
[   63.185154] ------------[ cut here ]------------
[   63.194350] WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xc8/0x34c
[   63.210439] Modules linked in:
[   63.216518] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.54 #0
[   63.228458] Stack : 00000000 00000000 00000000 00000000 80b7be92 00000034 00000000 00000000
[   63.245128]         8fc3845c 8061df87 8059a9f0 00000000 00000001 00000001 8fc0db38 532616d7
[   63.261799]         00000000 00000000 80b80000 8fc0da34 86edf5c4 0000009a 00000007 00000000
[   63.278470]         00000000 80620000 00034dc6 00000000 00000000 00000000 80650000 800b4b04
[   63.295142]         00000009 000001a0 8000d1dc 00000000 00000003 8029cc70 00000000 80690000
[   63.311813]         ...
[   63.316683] Call Trace:
[   63.321545] [<80010598>] show_stack+0x58/0x100
[   63.330384] [<8045b46c>] dump_stack+0x9c/0xe0
[   63.339053] [<80030684>] __warn+0xe0/0x114
[   63.347200] [<80030748>] warn_slowpath_null+0x1c/0x34
[   63.357249] [<800b4b04>] smp_call_function_many+0xc8/0x34c
[   63.368159] [<8000da38>] arch_trigger_cpumask_backtrace+0x68/0x78
[   63.380284] [<800985a8>] rcu_dump_cpu_stacks+0xc4/0x13c
[   63.390678] [<80097760>] rcu_check_callbacks+0x310/0x814
[   63.401245] [<8009b244>] update_process_times+0x34/0x70
[   63.411638] [<800acecc>] tick_handle_periodic+0x34/0xd0
[   63.422034] [<8031d200>] gic_compare_interrupt+0x70/0x80
[   63.432605] [<8008ad44>] handle_percpu_devid_irq+0xbc/0x1a8
[   63.443689] [<800851e0>] generic_handle_irq+0x40/0x58
[   63.453741] [<802595fc>] gic_handle_local_int+0x9c/0xe8
[   63.464134] [<802597bc>] gic_irq_dispatch+0x10/0x20
[   63.473835] [<800851e0>] generic_handle_irq+0x40/0x58
[   63.483886] [<80479e7c>] do_IRQ+0x1c/0x2c
[   63.491857] [<8025878c>] plat_irq_dispatch+0xfc/0x138
[   63.501903] [<8000b5e8>] except_vec_vi_end+0xb8/0xc4
[   63.511792] ---[ end trace 9d1d2d2a499501a4 ]---

version: 18.06 and 18.06.1
soc: 7621 soc
Hardware platform: 11nas router
The probability of: Four times out of ten
Repeat method: Constantly unplug and plug
Other exceptions: pppoe sometimes breaks flow

https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt may have some clues. Silence over the last 9 days suggests that it isn't a common occurrence.

Also, is this device running OpenWrt? If so, where did you obtain the build? I ask as I don't see it listed in https://openwrt.org/toh/start?dataflt[Model*~]=nas

1 Like

My device runs the openwrt system, and I get the build from git.
Commit: 03 b693064b6ad85ce9c57ab0226b986b09c8eb05
In combination with the log and the code, it is the problem caused by waiting for each other when CPU0 starts CPU1 during startup.
type of router: https://downloads.openwrt.org/releases/18.06.1/targets/ramips/mt7621/openwrt-18.06.1-ramips-mt7621-11acnas-squashfs-sysupgrade.bin

Thanks! Not knowing your device's architecture, I didn't know if it might be related to other reports of stalls, that at quick glance appear to be related to Mediatek-based devices.

https://forum.openwrt.org/search?q=rcu_sched

https://bugs.openwrt.org/index.php?do=details&task_id=804&order=dateopened&sort=desc
this link show that the problem has been solved in lede 17.04,but the problem reappeared in lede 18.16 and 18.06.1.

Other people have had this problem, and I'll see how they solved it. Thank you very much.

If you don't find an answer there, to help others with expertise in the MediaTek devices find this thread, you might want to simplify the title to something like

mt7621 - How to resolve "rcu_sched self-detected stall on CPU"?

(It looks like it's happening a minute after boot, at least from your log)

1 Like

https://bugs.openwrt.org/index.php?do=details&task_id=804&order=dateopened&sort=desc
this link show that the problem has been solved in lede 17.04,but the problem reappeared in lede 18.16 and 18.06.1.

thanks a lot

ok,I'll do that next time.

Same problem with Strong 1200 (https://forum.openwrt.org/t/support-for-strong-1200/22768)

Salu2