I've upgraded it to stable (LEDE Reboot 17.01.0 r3205-59508e3 / LuCI lede-17.01 branch (git-17.051.53299-a100738)), however still getting similar issues:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] INFO: rcu_sched self-detected stall on CPU
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] INFO: rcu_sched detected stalls on CPUs/tasks:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #0110-...: (1 ticks this GP) idle=611/2/0 softirq=1265803/1265803 fqs=1
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011(detected by 1, t=41445 jiffies, g=587882, c=587881, q=14747)
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Task dump for CPU 0:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] swapper/0 R running 0 0 0 0x00100000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Stack : 00000000 3e8a8bcf 00000000 ffffffff 00000000 00000000 804762a4 80420000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 8042475c 804706b4 80480000 9bff0000 9bfe8000 9bfe8000 00000000 80013554
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 11000403 00000000 80418000 80419eb0 00000000 8005dbf8 11000403 00000000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 00000000 80420000 d0800400 8001356c 80480000 80009fe4 80480000 8042dc50
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 80480000 80420000 80480000 8044ab50 00000000 8046d4ec 80417958 00000034
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 ...
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Call Trace:[<80013554>] 0x80013554
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8005dbf8>] 0x8005dbf8
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8001356c>] 0x8001356c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80009fe4>] 0x80009fe4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8044ab50>] 0x8044ab50
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8044a36c>] 0x8044a36c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] rcu_sched kthread starved for 41444 jiffies! g587882 c587881 f0x0 s3 ->state=0x1
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #0110-...: (1 GPs behind) idle=611/2/0 softirq=1265803/1265803 fqs=0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 (t=0 jiffies g=587883 c=587882 q=24407)
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Task dump for CPU 0:
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] swapper/0 R running 0 0 0 0x00100000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Stack : 00000000 80063d5c 00000000 00000e79 00000000 00000000 00000000 00000000
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Feb 26 15:41:01 jpmhome-router kernel: message repeated 3 times: [ [15916.970000] #011 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] #011 ...
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] Call Trace:[<80063d5c>] 0x80063d5c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80016448>] 0x80016448
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80016448>] 0x80016448
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8006db7c>] 0x8006db7c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80061c78>] 0x80061c78
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<800712ec>] 0x800712ec
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80073248>] 0x80073248
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8008248c>] 0x8008248c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<800742a8>] 0x800742a8
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80074500>] 0x80074500
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802c9718>] 0x802c9718
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8027d590>] 0x8027d590
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<800694e4>] 0x800694e4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80064c48>] 0x80064c48
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802aab50>] 0x802aab50
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801df198>] 0x801df198
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801df374>] 0x801df374
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80064c48>] 0x80064c48
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802ab248>] 0x802ab248
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80013620>] 0x80013620
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8029a520>] 0x8029a520
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801de6c0>] 0x801de6c0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80005980>] 0x80005980
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8027d648>] 0x8027d648
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80073d60>] 0x80073d60
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8007416c>] 0x8007416c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8000f118>] 0x8000f118
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8d35dfe4>] 0x8d35dfe4 [sch_htb@8d35c000+0x33f0]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802c9964>] 0x802c9964
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8d35ce38>] 0x8d35ce38 [sch_htb@8d35c000+0x33f0]
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802ab028>] 0x802ab028
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802da3a4>] 0x802da3a4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e42e4>] 0x802e42e4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e64a4>] 0x802e64a4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802da448>] 0x802da448
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e5854>] 0x802e5854
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e2a90>] 0x802e2a90
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e263c>] 0x802e263c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e14c4>] 0x802e14c4
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802e0cec>] 0x802e0cec
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802a7430>] 0x802a7430
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802a903c>] 0x802a903c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<802a8de0>] 0x802a8de0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8002eb64>] 0x8002eb64
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<8002ee4c>] 0x8002ee4c
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<801de6c0>] 0x801de6c0
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000] [<80005980>] 0x80005980
Feb 26 15:41:01 jpmhome-router kernel: [15916.970000]
Are you using SQM by any chance? I enabled SQM on a DIR-860L (also Mediatek) and I've been getting similar messages and random reboots since. Once I disabled SQM, things were stable again, like before.
I've disabled for today/tonight collectd, in case the problem comes from there ...
If this still happens tonight, then I will re-enable collectd and disable SQM, and will report back here, or try to see how to inform the SQM developers that something is broken there.
I think my problem is related to one of the collectd packages. I still keep SQM enabled since a few days ago, disabled collectd, and didn't failed, so I'm enabling one by one each collectd option each day, to check which one causes the problem.
You keep SQM indeed. Just those three should be fine for testing.
Ethtool just applies to the device providing WAN afaik, if it's part of the switch, you have to apply it to the physical device. For me, that was eth0, eth0.2 being WAN.
Understood, I guess we need to put it in rc.local or there is any way to have that by default, so is not lost if some time you reboot the router (power outage, etc.).
I just had another sudden reboot this morning, so disabling offloading might not solve it completely (unfortunately), but the uptime is way better already than with offloading still enabled (sometimes it rebooted after a few minutes).
I got one reboot (typically I got 1-2 per day) in few hours after ethtool -K $interface tso off gso off gro off
So I believe the problem is not there. I've disabled from collectd: entropy, interrupts, processes and TCP connections, and curiously uptime is working again. So I supect that one of those is the cause of my problem.
After one day without reboots, i turned on again offloading:
ethtool -K $interface tso on gso on gro on
I'm almost sure is something in collectd, so will wait for one day, and then turn on each of the collectd items every other day ... to see if any of those is the guilty.