Mushoz
May 13, 2017, 10:01pm
49
You are running the DIR-860L as well, right? How's the 4.4.67 kernel treating you? Are you using SQM by any chance? The current master branch and 17.01 branch both have issues with SQM on the mt7621 devices. It can cause stack traces and crashes that result in a reboot. Kernel 4.9 seems to have fixed it for me, but it is causing other issues for me. Was wondering whether kernel 4.4.67 is any good on mt7621 devices with SQM enabled.
For further discussion on the aforementioned issues, please see the end of this thread:
https://forum.openwrt.org/t/optimized-build-for-the-d-link-dir-860l/
And this bug report (please vote for it if you would like to developers to focus on this bug):
opened 07:06AM - 06 May 17 UTC
closed 02:04PM - 11 Feb 18 UTC
flyspray
*Mushoz:*
There has been a large number of reports of bugs with MT7621 devices … in combination with SQM. Debugging is difficult, because it often results in a hardcrash which leaves no log files. I believe I have some interesting details that might make it easier to debug.
**Device:** DIR-860L rev B1, but according to reports all MT7621 devices are affected.
**LEDE Version:** LEDE Reboot SNAPSHOT r4094-961c0ea
**Steps to reproduce:** Run a dslreports.com speedtest with a large number of upload and download streams (32/32) with either SQM or QOS enabled on your WAN interface.
**Observations:**
* It happens both with SQM-scripts _and_ QOS. So I don't believe it is an issue with the SQM package specifically. These two packages have in common that they both shape traffic.
* It seems to be **load dependent**. 100/100 and 200/200 mbit egress/ingress limits crash less often than 300/300 or higher limits
* It happens with all qdiscs: Cake + piece of cake, fq_codel + simple, fq_codel + simplest
**Crash log:**
There is usually no crash log because the router hardlocks and then reboots. But I got very lucky once and managed to get a log of the event:
<code>[ 710.140000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 710.150000] 1-...: (257 GPs behind) idle=dfc/0/0 softirq=48167/48179 fqs=1
[ 710.160000] (detected by 2, t=6004 jiffies, g=13114, c=13113, q=1063)
[ 710.170000] Task dump for CPU 1:
[ 710.180000] swapper/1 R running 0 0 1 0x00100000
[ 710.190000] Stack : 00000000 5b6c286a 000000a3 ffffffff 00000090 773742c0 804df2a4 80490000
[ 710.190000] 8048c75c 00000001 00000001 8048c540 8048c724 80490000 00000000 800135e4
[ 710.190000] 00000000 00000001 87c70000 87c71ec0 80490000 8005ec74 1100fc03 00000001
[ 710.190000] 00000000 80490000 804df2a4 8005ec6c 80490000 8001b1a8 1100fc03 00000000
[ 710.190000] 00000004 8048c4a0 000000a0 8001b1b0 8c94e220 00008018 dc124877 a0020044
[ 710.190000] ...
[ 710.260000] Call Trace:
[ 710.270000] [<8000be98>] __schedule+0x574/0x758
[ 710.280000] [<800135e4>] r4k_wait_irqoff+0x0/0x20
[ 710.290000]
[ 710.290000] rcu_sched kthread starved for 6016 jiffies! g13114 c13113 f0x0 s3 ->state=0x1
[ 782.470000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 782.470000] 1-...: (0 ticks this GP) idle=12c/0/0 softirq=48179/48179 fqs=0
[ 782.470000] (detected by 0, t=6002 jiffies, g=13324, c=13323, q=1260)
[ 782.470000] Task dump for CPU 1:
[ 782.470000] swapper/1 R running 0 0 1 0x00100000
[ 782.470000] Stack : 00000000 00000001 0000000a 00000000 00000000 00000001 804df2a4 80490000
[ 782.470000] 8048c75c 00000001 00000001 8048c540 8048c724 80490000 00000000 800135e4
[ 782.470000] 00000000 00000001 87c70000 87c71ec0 80490000 8005ec74 1100fc03 00000001
[ 782.470000] 00000000 80490000 804df2a4 8005ec6c 80490000 8001b1a8 1100fc03 00000000
[ 782.470000] 00000004 8048c4a0 000000a0 8001b1b0 8c94e220 00008018 dc124877 a0020044
[ 782.470000] ...
[ 782.470000] Call Trace:
[ 782.470000] [<8000be98>] __schedule+0x574/0x758
[ 782.470000] [<800135e4>] r4k_wait_irqoff+0x0/0x20
[ 782.470000]
[ 782.470000] rcu_sched kthread starved for 6002 jiffies! g13324 c13323 f0x0 s3 ->state=0x1
[ 860.040000] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 860.050000] 1-...: (0 ticks this GP) idle=5a8/0/0 softirq=48179/48179 fqs=0
[ 860.060000] (detected by 3, t=6004 jiffies, g=13501, c=13500, q=2389)
[ 860.070000] Task dump for CPU 1:
[ 860.080000] swapper/1 R running 0 0 1 0x00100000
[ 860.090000] Stack : 00000000 00002cd1 00000000 777882c0 00000000 00000000 804df2a4 80490000
[ 860.090000] 8048c75c 00000001 00000001 8048c540 8048c724 80490000 00000000 800135e4
[ 860.090000] 00000000 00000001 87c70000 87c71ec0 80490000 8005ec74 1100fc03 00000001
[ 860.090000] 00000000 80490000 804df2a4 8005ec6c 80490000 8001b1a8 1100fc03 00000000
[ 860.090000] 00000004 8048c4a0 000000a0 8001b1b0 8c94e220 00008018 dc124877 a0020044
[ 860.090000] ...
[ 860.160000] Call Trace:
[ 860.170000] [<8000be98>] __schedule+0x574/0x758
[ 860.180000] [<800135e4>] r4k_wait_irqoff+0x0/0x20
[ 860.190000]
[ 860.190000] rcu_sched kthread starved for 6017 jiffies! g13501 c13500 f0x0 s3 ->state=0x1</code>
I hope it contains useful information for tracking down this bug. If there is anything else I can supply or test in order to help the debugging process, please let me know.