My router crash witi mt7621

it is with rc2 , and occasionally it crashes ...
some time after some minutes , and some other after 24 hours
when crashes , if does not responde at the ping
the image is made with imagebuilder and i've added at this time this packages

luci openvpn-openssl tcpdump mc nano openssh-sftp-server etherwake python luci-i18n-wol-it luci-app-sqm netperf

then there are other additional packages that i've added

fdisk
kmod-lib-crc-itu-t
kmod-lib-crc7
kmod-mmc-spi
kmod-mmc
kmod-sdhci-mt7620
kmod-sdhci
libfdisk
libsmartcols
mmc-utils
perl-net-telnet
perl
perlbase-config
perlbase-essential
perlbase-socket
perlbase-symbol
perlbase-xsloader

i've managed to log from the serial port , and this is the result

[13722.840000] INFO: rcu_sched detected stalls on CPUs/tasks:

  [13722.850000]  3-...: (0 ticks this GP) idle=80a/0/0
  softirq=1002212/1002212 fqs=0

  [13722.860000]  (detected by 1, t=6004 jiffies, g=141352,
  c=141351, q=16484)

  [13722.880000] Task dump for CPU 3:

  [13722.880000] swapper/3       R running      0     0      1
  0x00100000

  [13722.890000] Stack : 00000000 00072c8b 7f84e9c0 00000000
  00000010 77cd9564 804762a4 80420000

            8042475c 00000001 00000001 80424680 80424724 80420000
  000010d9 80013554

            1100fc03 00000003 8fc74000 8fc75ec0 80420000 8005dc00
  1100fc03 00000003

            00000000 80420000 804762a4 8005dbf8 80420000 8001acb0
  1100fc03 00000000

            00000004 804244a0 000000a0 8001acb8 bff7f7ff ffffd7fd
  ffd3cfff fbf76fff

            ...

  [13722.970000] Call Trace:[<80013554>] 0x80013554

  [13722.980000] [<8005dc00>] 0x8005dc00

  [13722.980000] [<8005dbf8>] 0x8005dbf8

  [13722.990000] [<8001acb0>] 0x8001acb0

  [13723.000000] [<8001acb8>] 0x8001acb8

  [13723.000000]


[81273.420000] INFO: rcu_sched detected stalls on CPUs/tasks:

  [81273.430000]  3-...: (0 ticks this GP) idle=442/0/0
  softirq=6466593/6466593 fqs=0

  [81273.440000]  (detected by 1, t=6004 jiffies, g=577320,
  c=577319, q=300)

  [81273.450000] Task dump for CPU 3:

  [81273.460000] swapper/3       R running      0     0      1
  0x00100000

  [81273.470000] Stack : 00000000 ff7641cc 000049e9 ffffffff
  0000411f 00000000 804762a4 80420000

            8042475c 00000001 00000001 80424680 80424724 80420000
  000010d9 80013554

            1100fc03 00000003 8fc74000 8fc75ec0 80420000 8005dc00
  1100fc03 00000003

            00000000 80420000 804762a4 8005dbf8 80420000 8001acb0
  1100fc03 00000000

            00000004 804244a0 000000a0 8001acb8 bff7f7ff ffffd7fd
  e3d3cfff fbd76fff

            ...

  [81273.550000] Call Trace:[<80013554>] 0x80013554

  [81273.560000] [<8005dc00>] 0x8005dc00

  [81273.560000] [<8005dbf8>] 0x8005dbf8

  [81273.570000] [<8001acb0>] 0x8001acb0

  [81273.580000] [<8001acb8>] 0x8001acb8

  [81273.580000]




i hope someone can point me in the right direction , thanks

The mediatek based devices are currently experiencing issues with sqm-cake. Please disable that and see if the crashes are fixed.

i've disabled it some minute before posting, i thought it is too, but if the problem is with the sqm-cake disciplines, may i use the fq_codel? or is with the whole sqm software?

I think fq_codel is fine. But I would leave it disabled for a few days to test whether the problem is actually solved. After that, you can start adding back complexity to your setup :slight_smile:

Hi, I have the same issue even with 17.01.0. I reported it here: https://bugs.lede-project.org/index.php?do=details&task_id=606

Some people suspect that the WDS - relayd may blamed for this: https://forum.openwrt.org/t/lede-v17-01-0-rc1/1285/10.

This may be fixed with current trunk. It's a kernel issue. Current trunk uses kernel 4.9 while stable runs 4.4.

I've the same issue with a ZBT-WG3526, was happening with OpenWRT and with LEDE rc2. Upgraded to stable 17.01.0 and same issue.

So I started trying all kind of things. Right now is not happening for over 5 days, since I disabled SQM.

ok, now after a week, here my result
with the sqm disabled at all, no problem
with sqm enabled but with fq_codel , i've the same problem,

then i disable it at all
for now i haven't spare time to try with the trunk , maybe in the future
thanks to all

Same result for me. While Cake tends to crash faster than fq_codel, both are giving me crashes. Really looking forward to a fix for this issue. Disabling offloading through ethtool reduced the number of crashes even further, but it still crashes occasionally. Leaving SQM disabled for now.

LEDE 17.01.4 Newifi D1 experiencing similar issue.
Using tc to attach cake qdisc to any interface will instantly crash the router. fq_codel works fine.