Problem Summary
Since upgrading to 18.06.4
I am seeing a periodic crash on my EdgeRouter X.
- The router would just become unresponsive (to DHCP requests, ssh, etc).
Reproducable?
- around once or twice a week
Description
Today, I got lucky and managed to SSH in before it completely hung up its shoes.
Environment
Architecture : MediaTek MT7621 ver:1 eco:3
Firmware Version : OpenWrt 18.06.4 r7808-ef686b7292 / LuCI openwrt-18.06 branch (git-19.170.32094-4d6d8bc)
Kernel: 4.14.131
- This is the standard sysupgrade from the OpenWrt website (not my own compiled build).
Packages I have installed after upgrade:
luci-app-adblock
ddns-scripts
luci-mod-ddns
luci-mod-rpc
luci-app-openvpn openvpn-openssl
luci-proto-wireguard luci-app-wireguard wireguard kmod-wireguard wireguard-tools
banip - 0.1.4-1
(let me know if you need the full list).
Logs
Here is what I managed to get from logread
before it went unresponsive:
Mon Jul 22 13:54:44 2019 daemon.notice odhcpd[930]: Got DHCPv6 request
Mon Jul 22 13:54:44 2019 daemon.warn odhcpd[930]: DHCPV6 REQUEST IA_NA from 00010001233b78ad38539cc351bc on br-lan: ok fdd0:43a3:e55c::e99/128
Mon Jul 22 13:56:10 2019 kern.err kernel: [98924.496203] INFO: rcu_sched self-detected stall on CPU
Mon Jul 22 13:56:10 2019 kern.err kernel: [98924.506572] 2-...: (1 GPs behind) idle=866/140000000000001/0 softirq=2472365/2472367 fqs=2998
Mon Jul 22 13:56:10 2019 kern.err kernel: [98924.516181] INFO: rcu_sched detected stalls on CPUs/tasks:
Mon Jul 22 13:56:10 2019 kern.err kernel: [98924.523900]
Mon Jul 22 13:56:10 2019 kern.err kernel: [98924.523925] 2-...: (1 GPs behind) idle=866/140000000000001/0 softirq=2472365/2472367 fqs=2998
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.534811] (t=6003 jiffies g=1045463 c=1045462 q=162)
Mon Jul 22 13:56:10 2019 kern.err kernel: [98924.537926]
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.555218] NMI backtrace for cpu 2
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.565597] (detected by 3, t=6006 jiffies, g=1045463, c=1045462, q=162)
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.568718] CPU: 2 PID: 1221 Comm: kworker/2:1 Not tainted 4.14.131 #0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.601919] Workqueue: events 0x80237e28
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.609715] Stack : 00000000 00000000 804baa10 8fc0dd24 00000000 00000000 00000000 00000000
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.626357] 00000000 00000000 00000000 00000000 00000000 00000001 8fc0dce0 53261662
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.643000] 8fc0dd78 00000000 00000000 00003ae0 00000038 804835d8 00000008 00000000
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.659641] 00000000 80530000 00092f3f 00000000 8fc0dcc0 00000000 80550000 00000002
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.676283] 80534480 8052c0ac 000000e0 80530000 00000003 8029b5a8 00000008 80590008
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.692925] ...
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.697783] Call Trace:
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.697802] [<804835d8>] 0x804835d8
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.709579] [<8029b5a8>] 0x8029b5a8
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.716508] [<80010090>] 0x80010090
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.723437] [<80010098>] 0x80010098
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.730366] [<8046c57c>] 0x8046c57c
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.737296] [<80071294>] 0x80071294
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.744226] [<80473474>] 0x80473474
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.751156] [<8000cf90>] 0x8000cf90
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.758085] [<8000cf90>] 0x8000cf90
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.765013] [<80473560>] 0x80473560
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.771943] [<80084b48>] 0x80084b48
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.778875] [<80083fa0>] 0x80083fa0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.785851] [<80087518>] 0x80087518
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.792783] [<800980fc>] 0x800980fc
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.799715] [<8031e700>] 0x8031e700
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.806645] [<8035125c>] 0x8035125c
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.813574] [<80077928>] 0x80077928
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.820506] [<80071c40>] 0x80071c40
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.827437] [<80252ed0>] 0x80252ed0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.834367] [<80252d7c>] 0x80252d7c
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.841297] [<80252f3c>] 0x80252f3c
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.848226] [<80071c40>] 0x80071c40
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.855156] [<804899e4>] 0x804899e4
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.862085] [<80251f0c>] 0x80251f0c
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.869018] [<8000b4e8>] 0x8000b4e8
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98924.875944]
Mon Jul 22 13:56:10 2019 kern.info kernel: [98924.878913] Sending NMI from CPU 3 to CPUs 2:
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.436380] NMI backtrace for cpu 2
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.443396] CPU: 2 PID: 1221 Comm: kworker/2:1 Not tainted 4.14.131 #0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.456444] Workqueue: events 0x80237e28
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.464347] task: 8fdc0640 task.stack: 8e7dc000
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.473426] $ 0 : 00000000 00000001 00000000 0000000a
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.484044] $ 4 : 00000004 000c0000 00966c35 00966c35
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.494705] $ 8 : 0000ffff ffff0000 d9f2f27f 00000002
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.505412] $12 : 00000000 00000000 ffffffff 00004f22
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.516008] $16 : 8a297f7c 81231340 8052c1b8 000c0000
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.526637] $20 : 8058b362 8f0d6ab0 000000bc 0000001f
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.537263] $24 : 3b9aca00 8000ce94
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.547895] $28 : 8e7dc000 8e7dddf0 8f17fc00 8006a5f0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.558533] Hi : 0000000a
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.564328] Lo : 66666669
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.570147] epc : 8006a68c 0x8006a68c
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.577952] ra : 8006a5f0 0x8006a5f0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.585774] Status: 11007c03 KERNEL EXL IE
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.594525] Cause : 50800400 (ExcCode 00)
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.602654] PrId : 0001992f (MIPS 1004Kc)
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.610922] CPU: 2 PID: 1221 Comm: kworker/2:1 Not tainted 4.14.131 #0
Mon Jul 22 13:56:10 2019 kern.warn kernel: [98928.624048] Workqueue: events 0x80237e28
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.632000] Stack : 00000000 00000000 804baa10 8fc0dd64 00000000 00000000 00000000 00000000
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.648828] 00000000 00000000 00000000 00000000 00000000 00000001 8fc0dd20 53261662
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.665643] 8fc0ddb8 00000000 00000000 000045c0 00000038 804835d8 00000008 00000000
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.682460] 00000000 80530000 000985b0 00000000 8fc0dd00 00000000 80550000 00000002
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.699281] 804c1fcc 8051f4e0 804bfbd8 80530000 00000003 8029b5a8 00000008 80590008
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.716104] ...
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.721026] Call Trace:
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.721097] [<804835d8>] 0x804835d8
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.732971] [<8029b5a8>] 0x8029b5a8
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.739960] [<80010090>] 0x80010090
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.746948] [<80010098>] 0x80010098
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.753937] [<8046c57c>] 0x8046c57c
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.760923] [<80010154>] 0x80010154
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.767914] [<80473454>] 0x80473454
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.774940] [<8000d0a4>] 0x8000d0a4
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.781897] [<8000d0b4>] 0x8000d0b4
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.788894] [<8009f764>] 0x8009f764
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.795883] [<80058ab8>] 0x80058ab8
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.802905] [<80015590>] 0x80015590
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.809918] [<800727e0>] 0x800727e0
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.816952] [<80072914>] 0x80072914
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.823950] [<800729b8>] 0x800729b8
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.830939] [<8031e700>] 0x8031e700
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.837923] [<8035125c>] 0x8035125c
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.844912] [<80076ca0>] 0x80076ca0
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.851898] [<80071c40>] 0x80071c40
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.858882] [<80071c40>] 0x80071c40
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.865868] [<80252ed0>] 0x80252ed0
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.872855] [<80252d7c>] 0x80252d7c
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.879849] [<80252f3c>] 0x80252f3c
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.886838] [<80071c40>] 0x80071c40
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.893829] [<804899e4>] 0x804899e4
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.900817] [<80251f0c>] 0x80251f0c
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.907821] [<8000b4e8>] 0x8000b4e8
Mon Jul 22 13:56:11 2019 kern.warn kernel: [98928.914793]
- When I ran
htop
I could see 2 out of the 4 cores were maxing out at 100% always. However, it didnt show any particilar process in the list that was using more than 1-3% cpu. Strange.
Any ideas about what might be causing this? Thanks for any help!