Official v19.07.5 build
Perhaps my lock up is something different, these are first 4 traces that happened in the same second.
A lot more traces in logs until i power cycled the router. I was receiving the traces via UDP log, so router was not completely dead from network prospective.
1:
INFO: rcu_sched self-detected stall on CPU
0-...: (1 GPs behind) idle=0b2/140000000000001/0 softirq=2840147/2840148 fqs=3000
(t=6000 jiffies g=600482 c=600481 q=97)
NMI backtrace for cpu 0
CPU: 0 PID: 9933 Comm: kworker/0:2 Not tainted 4.14.209 #0
Workqueue: events 0x802409f4
Stack : 00000000 00000000 804d7568 8fc09d24 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000001 8fc09ce0 ac07f5c8
8fc09d78 00000000 00000000 00005af8 00000038 8049da98 00000008 00000000
00000000 80550000 000732bc 00000000 8fc09cc0 00000000 00000000 00000000
80554500 8054c0cc 000000e0 80550000 00000000 802ae190 00000000 806b0000
...
Call Trace:
[<8049da98>] 0x8049da98
[<802ae190>] 0x802ae190
[<8000c1a0>] 0x8000c1a0
[<8000c1a8>] 0x8000c1a8
[<804868d4>] 0x804868d4
[<80072c54>] 0x80072c54
[<8048d8f4>] 0x8048d8f4
[<80008f30>] 0x80008f30
[<80008f30>] 0x80008f30
[<8048d9e0>] 0x8048d9e0
[<80086558>] 0x80086558
[<800859a8>] 0x800859a8
[<80089118>] 0x80089118
[<8009a110>] 0x8009a110
[<8009a168>] 0x8009a168
[<8005f3e4>] 0x8005f3e4
[<80332644>] 0x80332644
[<80079328>] 0x80079328
[<800735e0>] 0x800735e0
[<8025bc44>] 0x8025bc44
[<8025be04>] 0x8025be04
[<800735e0>] 0x800735e0
[<804a475c>] 0x804a475c
[<8025b4c0>] 0x8025b4c0
[<80007488>] 0x80007488
2:
INFO: rcu_sched self-detected stall on CPU
0-...: (1 GPs behind) idle=0b2/140000000000001/0 softirq=2840147/2840148 fqs=11993
(t=24003 jiffies g=600482 c=600481 q=473)
NMI backtrace for cpu 0
CPU: 0 PID: 9933 Comm: kworker/0:2 Not tainted 4.14.209 #0
Workqueue: events 0x802409f4
Stack : 00000000 00000000 804d7568 8fc09d24 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000001 8fc09ce0 ac07f5c8
8fc09d78 00000000 00000000 000062a0 00000038 8049da98 00000008 00000000
00000000 80550000 00078fca 00000000 8fc09cc0 00000000 00000000 00000000
80554500 8054c0cc 000000e0 80550000 00000000 802ae190 00000000 806b0000
...
Call Trace:
[<8049da98>] 0x8049da98
[<802ae190>] 0x802ae190
[<8000c1a0>] 0x8000c1a0
[<8000c1a8>] 0x8000c1a8
[<804868d4>] 0x804868d4
[<80072c54>] 0x80072c54
[<8048d8f4>] 0x8048d8f4
[<80008f30>] 0x80008f30
[<80008f30>] 0x80008f30
[<8048d9e0>] 0x8048d9e0
[<80086558>] 0x80086558
[<800859a8>] 0x800859a8
[<80089118>] 0x80089118
[<8009a110>] 0x8009a110
[<8009a168>] 0x8009a168
[<80332644>] 0x80332644
[<80079328>] 0x80079328
[<800735e0>] 0x800735e0
[<8025bd98>] 0x8025bd98
[<8025bc44>] 0x8025bc44
[<8025be04>] 0x8025be04
[<800735e0>] 0x800735e0
[<804a475c>] 0x804a475c
[<8025b4c0>] 0x8025b4c0
[<80007488>] 0x80007488
Diff 1 and 2
@@ -1,13 +1,13 @@
INFO: rcu_sched self-detected stall on CPU
- 0-...: (1 GPs behind) idle=0b2/140000000000001/0 softirq=2840147/2840148 fqs=3000
- (t=6000 jiffies g=600482 c=600481 q=97)
+ 0-...: (1 GPs behind) idle=0b2/140000000000001/0 softirq=2840147/2840148 fqs=11993
+ (t=24003 jiffies g=600482 c=600481 q=473)
NMI backtrace for cpu 0
CPU: 0 PID: 9933 Comm: kworker/0:2 Not tainted 4.14.209 #0
Workqueue: events 0x802409f4
Stack : 00000000 00000000 804d7568 8fc09d24 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000001 8fc09ce0 ac07f5c8
- 8fc09d78 00000000 00000000 00005af8 00000038 8049da98 00000008 00000000
- 00000000 80550000 000732bc 00000000 8fc09cc0 00000000 00000000 00000000
+ 8fc09d78 00000000 00000000 000062a0 00000038 8049da98 00000008 00000000
+ 00000000 80550000 00078fca 00000000 8fc09cc0 00000000 00000000 00000000
80554500 8054c0cc 000000e0 80550000 00000000 802ae190 00000000 806b0000
...
Call Trace:
@@ -26,10 +26,10 @@
[<80089118>] 0x80089118
[<8009a110>] 0x8009a110
[<8009a168>] 0x8009a168
-[<8005f3e4>] 0x8005f3e4
[<80332644>] 0x80332644
[<80079328>] 0x80079328
[<800735e0>] 0x800735e0
+[<8025bd98>] 0x8025bd98
[<8025bc44>] 0x8025bc44
[<8025be04>] 0x8025be04
[<800735e0>] 0x800735e0
3:
INFO: rcu_sched self-detected stall on CPU
0-...: (1 GPs behind) idle=0b2/140000000000001/0 softirq=2840147/2840148 fqs=17290
(t=42006 jiffies g=600482 c=600481 q=872)
rcu_sched kthread starved for 7393 jiffies! g600482 c600481 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=2
rcu_sched R running task 0 8 2 0x00100000
Stack : 00000001 8fc3aca0 8fc57e38 8fc57e38 8fc57e38 80550000 814be380 00000000
81495da0 80052c18 01d9e8ef 814be380 8fc57e38 80550000 814be380 00000000
00000003 8054c1e8 80554610 8049fe18 8fc57e38 80550000 80550000 00000000
00000003 804a3630 80550000 00000000 8fc57ebc 80068228 00000000 00000001
00000200 00000000 01d9e8ef 800886a0 8fc3aca0 0c000003 80554500 00000000
...
Call Trace:
[<80052c18>] 0x80052c18
[<8049fe18>] 0x8049fe18
[<804a3630>] 0x804a3630
[<80068228>] 0x80068228
[<800886a0>] 0x800886a0
[<80084bc4>] 0x80084bc4
[<80080000>] 0x80080000
[<80081c0c>] 0x80081c0c
[<80084664>] 0x80084664
[<8004c0f8>] 0x8004c0f8
[<8004bfc8>] 0x8004bfc8
[<8004bfc8>] 0x8004bfc8
[<8004bfc8>] 0x8004bfc8
[<80006f78>] 0x80006f78
4:
NMI backtrace for cpu 0
CPU: 0 PID: 9933 Comm: kworker/0:2 Not tainted 4.14.209 #0
Workqueue: events 0x802409f4
Stack : 00000000 00000000 804d7568 8fc09d24 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000001 8fc09ce0 ac07f5c8
8fc09d78 00000000 00000000 00006f70 00000038 8049da98 00000008 00000000
00000000 80550000 0009b1b2 00000000 8fc09cc0 00000000 00000000 00000000
80554500 8054c0cc 000000e0 80550000 00000000 802ae190 00000000 806b0000
...
Call Trace:
[<8049da98>] 0x8049da98
[<802ae190>] 0x802ae190
[<8000c1a0>] 0x8000c1a0
[<8000c1a8>] 0x8000c1a8
[<804868d4>] 0x804868d4
[<80072c54>] 0x80072c54
[<8048d8f4>] 0x8048d8f4
[<80008f30>] 0x80008f30
[<80008f30>] 0x80008f30
[<8048d9e0>] 0x8048d9e0
[<8005384c>] 0x8005384c
[<80051840>] 0x80051840
[<80086558>] 0x80086558
[<8008646c>] 0x8008646c
[<800859a8>] 0x800859a8
[<80089118>] 0x80089118
[<8009a110>] 0x8009a110
[<8009a168>] 0x8009a168
[<80332644>] 0x80332644
[<80079328>] 0x80079328
[<800735e0>] 0x800735e0
[<8025bd98>] 0x8025bd98
[<8025bc44>] 0x8025bc44
[<8025be04>] 0x8025be04
[<800735e0>] 0x800735e0
[<804a475c>] 0x804a475c
[<8025b4c0>] 0x8025b4c0
[<80007488>] 0x80007488
PS: not familiar how to decode these for openwrt, this is:
# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='19.07.5'
DISTRIB_REVISION='r11257-5090152ae3'
DISTRIB_TARGET='ramips/mt7621'
DISTRIB_ARCH='mipsel_24kc'
DISTRIB_DESCRIPTION='OpenWrt 19.07.5 r11257-5090152ae3'
DISTRIB_TAINTS=''
# uname -a
Linux gw 4.14.209 #0 SMP Sun Dec 6 07:31:03 2020 mips GNU/Linux
For now 7 days and 5 hours with master branch r15223 (Kernel 5.4.82) without any problem.
Spent some time searching and at a loss - what's the way to decode the code traces in openwrt ?
Someone is reporting that disabling TSO fixed the issue for them. They used a patch that can be found here: https://bugs.openwrt.org/index.php?do=details&task_id=2628
It might also be possible to disable TSO via ethtool, but I am not sure if that would also work for fixing it. Has anyone tried this potential solution?
Because of SQM I always disable as much offloads as possible with TSO among them
And do you ever experience the transmit queue has timed out error?
Yes, I did but after master switched over to DSA, I have not experienced them anymore
HWNAT is working now under DSA
Thatâs news to me.
You mean on master? Since when? I have seen no commit that would account for such a change. Running a build from a week ago, yet HW NAT is not working.
I tried master yesterdayďźhwnat is working
I am compiling a fresh build now. How did you tested it?
xiaomir3g working as iperf3 server(192.168.5.1) with itâs LAN connecting to phicomm K2P(192.168.1.1) 'wan port.
A compuer servring as iperf3 client and was connected to phicomm k2p's lan port.During the iperf3 test,no cpu loads in phicomm k2p.
Did a fresh compile, and of course HW offload does not work:
This is a DL speedtest of about 600megabits via PPPoE:
When HW offload worked, all cores were about 2-5% loaded during the same test, and I was able to reach 900+Mbits in DL.
I tested it again.
Yes,under pppoe it wont't work.But with dhcp environment it works.
Hm... Maybe @nbd can comment on this, as previously (on kernel 4.14) it did worked with both pure IP and PPPoE as well, at least with V4 addresses.
Hello. If any of you don't mind answering, what's the current status on this situation?
From my understanding in reading this, hwnat works (for some?) right now on OpenWrt built from master but not on PPPoE (server or client?) and some problem with VLANs but I might be wrong.
I would like to know if the following use case will not have any bugs or problems: Edgerouter X with WAN connection using PPPoE (The ER-X will be a client) and multiple VLANs setup on the switch0 interface (some ports are on one VLAN, some others on another VLAN etc.) while using hwnat. Also does it work on the latest stable build or only on master?
As much as I know, VLANs do work on master as well, but you need to configure it via CLI. PPPoE HW NAT for sure does not work (you can get about 600Mbits of speed via PPPoE in its current form with SW offload enabled), and IPv4 HW NAT is confirmed by a single user.
VLAN does work on master branch. And you don't need to use CLI to configure them, at least by ports (not switch0). Just configure the LAN port with the tagged VLAN you want to use using dot connotation, in LUCI, uci or network file. For example "eth0.6" to tag VLAN6 on port eth0 for ER-X.