ZBT-WG3526 with 17.01.0-rc2 - frequent reboots

Not sure if the right place to report ...

Frequent dumps, even if I've disabled the Wireless drivers, because it has been reported that they don't work ...

Feb 15 07:48:43 jpmhome-router kernel: [33932.610000] rcu_sched kthread starved for 6010 jiffies! g1182495 c1182494 f0x0 s3 ->state=0x1
Feb 15 07:49:00 jpmhome-router crond[1100]: USER root pid 7984 cmd sh /etc/cronscript/interface-monitor.sh
Feb 15 07:49:51 jpmhome-router kernel: [34001.330000] INFO: rcu_sched detected stalls on CPUs/tasks:
Feb 15 07:49:51 jpmhome-router kernel: [34001.330000] #0113-...: (0 ticks this GP) idle=1da/0/0 softirq=2359070/2359070 fqs=0
Feb 15 07:49:51 jpmhome-router kernel: [34001.340000] #011(detected by 1, t=6003 jiffies, g=1182961, c=1182960, q=37227)
Feb 15 07:49:51 jpmhome-router kernel: [34001.340000] Task dump for CPU 3:
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] swapper/3 R running 0 0 1 0x00100000
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] Stack : 00000000 02ad77d7 00001fd1 ffffffff 00001b35 778302c0 804762a4 80420000
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] #011 8042475c 00000001 00000001 80424680 80424724 80420000 00000000 80013554
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] #011 1100fc03 00000003 8fc74000 8fc75ec0 80420000 8005dc00 1100fc03 00000003
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] #011 00000000 80420000 804762a4 8005dbf8 80420000 8001acb0 1100fc03 00000000
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] #011 00000004 804244a0 000000a0 8001acb8 b6ac75d5 fc3fffbe 72037ad2 bfab239f
Feb 15 07:49:51 jpmhome-router kernel: [34001.350000] #011 ...
Feb 15 07:49:51 jpmhome-router kernel: [34001.390000] Call Trace:[<80013554>] 0x80013554
Feb 15 07:49:51 jpmhome-router kernel: [34001.390000] [<8005dc00>] 0x8005dc00
Feb 15 07:49:51 jpmhome-router kernel: [34001.400000] [<8005dbf8>] 0x8005dbf8
Feb 15 07:49:51 jpmhome-router kernel: [34001.400000] [<8001acb0>] 0x8001acb0
Feb 15 07:49:51 jpmhome-router kernel: [34001.400000] [<8001acb8>] 0x8001acb8

Feb 15 13:06:10 jpmhome-router kernel: [ 3543.120000] rcu_sched kthread starved for 6010 jiffies! g149172 c149171 f0x0 s3 ->state=0x1
Feb 15 13:07:00 jpmhome-router crond[1101]: USER root pid 3599 cmd sh /etc/cronscript/interface-monitor.sh
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.390000] INFO: rcu_sched detected stalls on CPUs/tasks:
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.390000] #0112-...: (0 ticks this GP) idle=2da/0/0 softirq=260709/260709 fqs=0
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.400000] #011(detected by 3, t=6003 jiffies, g=149351, c=149350, q=3914)
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.400000] Task dump for CPU 2:
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] swapper/2 R running 0 0 1 0x00100000
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] Stack : 00000000 99ff4b07 00000348 ffffffff 000002e4 00000000 804762a4 80420000
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] #011 8042475c 00000001 00000000 804245e0 80424724 80420000 000010d9 80013554
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] #011 1100fc03 00000002 8fc72000 8fc73ec0 80420000 8005dc00 1100fc03 00000002
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] #011 00000000 80420000 804762a4 8005dbf8 80420000 8001acb0 1100fc03 00000000
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] #011 00000004 804244a0 000000a0 8001acb8 d67aa5dc 5ea66f23 f7567166 f141eecc
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.410000] #011 ...
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.450000] Call Trace:[<80013554>] 0x80013554
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.450000] [<8005dc00>] 0x8005dc00
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.460000] [<8005dbf8>] 0x8005dbf8
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.460000] [<8001acb0>] 0x8001acb0
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.460000] [<8001acb8>] 0x8001acb8
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.470000] [<8001c144>] 0x8001c144
Feb 15 13:07:15 jpmhome-router kernel: [ 3608.470000]

I've upgraded it to stable (LEDE Reboot 17.01.0 r3205-59508e3 / LuCI lede-17.01 branch (git-17.051.53299-a100738)), however still getting similar issues:

Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] INFO: rcu_sched self-detected stall on CPU
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] INFO: rcu_sched detected stalls on CPUs/tasks:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #0110-...: (1 ticks this GP) idle=611/2/0 softirq=1265803/1265803 fqs=1
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011(detected by 1, t=41445 jiffies, g=587882, c=587881, q=14747)
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Task dump for CPU 0:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] swapper/0 R running 0 0 0 0x00100000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Stack : 00000000 3e8a8bcf 00000000 ffffffff 00000000 00000000 804762a4 80420000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 8042475c 804706b4 80480000 9bff0000 9bfe8000 9bfe8000 00000000 80013554
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 11000403 00000000 80418000 80419eb0 00000000 8005dbf8 11000403 00000000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 00000000 80420000 d0800400 8001356c 80480000 80009fe4 80480000 8042dc50
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 80480000 80420000 80480000 8044ab50 00000000 8046d4ec 80417958 00000034
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 ...
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Call Trace:[<80013554>] 0x80013554
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8005dbf8>] 0x8005dbf8
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8001356c>] 0x8001356c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80009fe4>] 0x80009fe4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8044ab50>] 0x8044ab50
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8044a36c>] 0x8044a36c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] rcu_sched kthread starved for 41444 jiffies! g587882 c587881 f0x0 s3 ->state=0x1
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #0110-...: (1 GPs behind) idle=611/2/0 softirq=1265803/1265803 fqs=0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 (t=0 jiffies g=587883 c=587882 q=24407)
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Task dump for CPU 0:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] swapper/0 R running 0 0 0 0x00100000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Stack : 00000000 80063d5c 00000000 00000e79 00000000 00000000 00000000 00000000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Feb 26 15:41:01 jpmhome-router kernel: message repeated 3 times: [ [15916.970000] #011 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 ...
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Call Trace:[<80063d5c>] 0x80063d5c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80016448>] 0x80016448
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80016448>] 0x80016448
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8006db7c>] 0x8006db7c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80061c78>] 0x80061c78
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<800712ec>] 0x800712ec
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80073248>] 0x80073248
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8008248c>] 0x8008248c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<800742a8>] 0x800742a8
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80074500>] 0x80074500
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802c9718>] 0x802c9718
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8027d590>] 0x8027d590
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<800694e4>] 0x800694e4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80064c48>] 0x80064c48
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802aab50>] 0x802aab50
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801df198>] 0x801df198
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801df374>] 0x801df374
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80064c48>] 0x80064c48
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802ab248>] 0x802ab248
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80013620>] 0x80013620
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8029a520>] 0x8029a520
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801de6c0>] 0x801de6c0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80005980>] 0x80005980
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8027d648>] 0x8027d648
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80073d60>] 0x80073d60
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8007416c>] 0x8007416c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8000f118>] 0x8000f118
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8d35dfe4>] 0x8d35dfe4 [sch_htb@8d35c000+0x33f0]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802c9964>] 0x802c9964
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8d35ce38>] 0x8d35ce38 [sch_htb@8d35c000+0x33f0]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802ab028>] 0x802ab028
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802da3a4>] 0x802da3a4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e42e4>] 0x802e42e4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e64a4>] 0x802e64a4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802da448>] 0x802da448
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e5854>] 0x802e5854
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e2a90>] 0x802e2a90
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e263c>] 0x802e263c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e14c4>] 0x802e14c4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e0cec>] 0x802e0cec
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802a7430>] 0x802a7430
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802a903c>] 0x802a903c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802a8de0>] 0x802a8de0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8002eb64>] 0x8002eb64
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8002ee4c>] 0x8002ee4c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801de6c0>] 0x801de6c0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80005980>] 0x80005980
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000]

Any hint?

Are you using SQM by any chance? I enabled SQM on a DIR-860L (also Mediatek) and I've been getting similar messages and random reboots since. Once I disabled SQM, things were stable again, like before.

Yeah, you may be right, I use SQM.

I've disabled for today/tonight collectd, in case the problem comes from there ...

If this still happens tonight, then I will re-enable collectd and disable SQM, and will report back here, or try to see how to inform the SQM developers that something is broken there.

Thanks!

I can report since disabling offloading my DIR-860L is stable again. No more kernel oopses or sudden reboots.

You can disable offloading as follows:

# ethtool -K $interface tso off gso off gro off

I think my problem is related to one of the collectd packages. I still keep SQM enabled since a few days ago, disabled collectd, and didn't failed, so I'm enabling one by one each collectd option each day, to check which one causes the problem.

I got the same problem again this morning, so I'm going to try the offload disable ... not sure if that means some performance drop ...

With this you keep SQM ?

By the way, have you applied the ethtool to all the interfaces or only to the WAN or which ones?

How you determined that the 3 offload parameters are required, or just decided to offload all?

You keep SQM indeed. Just those three should be fine for testing.

Ethtool just applies to the device providing WAN afaik, if it's part of the switch, you have to apply it to the physical device. For me, that was eth0, eth0.2 being WAN.

Understood, I guess we need to put it in rc.local or there is any way to have that by default, so is not lost if some time you reboot the router (power outage, etc.).

Yeah, at the moment.

I just had another sudden reboot this morning, so disabling offloading might not solve it completely (unfortunately), but the uptime is way better already than with offloading still enabled (sometimes it rebooted after a few minutes).

do you have device connected to internet or you're testing it offline?

It's serving as my router. Post release 17.01 branch build.

Same for me, but stable 17.01.0 build

I got one reboot (typically I got 1-2 per day) in few hours after ethtool -K $interface tso off gso off gro off

So I believe the problem is not there. I've disabled from collectd: entropy, interrupts, processes and TCP connections, and curiously uptime is working again. So I supect that one of those is the cause of my problem.

After one day without reboots, i turned on again offloading:
ethtool -K $interface tso on gso on gro on

I'm almost sure is something in collectd, so will wait for one day, and then turn on each of the collectd items every other day ... to see if any of those is the guilty.

Hello, Sir

Do you have solved the problem of reboot ?

If not,maybe i can help you , ZBT-WG3526 is our product. I have customer who met the same problem with you .

You can contact me via skype if you have .

My skype is zbt-sales02

Best regards
Irene

See also Solutions for ZBT-WG3526 reboot broken

I think this problem is related to the 32M flash, however I've the 16M version.

After a few extra testing during many days, definitively, the reboots on this device are caused by:

  1. Enabling SQM
    or
  2. Enabling the 2.4GHz WiFi

Any of those will create it, and also both enabled at the same time.

Disabling offloading doesn't work.