Mtk_soc_eth watchdog timeout after r11573

Today, just now, there's a bunch of commits to the master, in which I believe are hw nat related. So it looks like the hw nat is back on horse again. :smiley:

Had a quick test with Ookla speed test, with no offloading the 860L only can reach a little bit more than 200Mbps, with soft offloading it can reach around 370Mbps, and with hw offload it became a little bit more than 400Mbps.

It seems hw offloading works, in some way. But the test consumes one core, which should not happen once the real hw offload in place.

Hope there's a working hw offload 5.4 version in the next weeks, perhaps?

No, that is for the mediatek traget, ours is the ramips target. There is no HW offload in sight so far.

On 5.4 kernel, you can use SW offload and you need to set the RPS to "e":

echo e > /sys/class/net/eth0/queues/rx-0/rps_cpus

to get proper WAN speed.

5 Likes

The latest build has succeeded and is now on the snapshots page.

1 Like

The new 5.4 kernel will cause ed2k server not be able to keep-alive, whether soft offloading is on or off.

It didn't happen with former 4.14.x kernels.

Edit: Sorry for the rush, the issue only with soft offloading is on. Need to restart the client after back to off, it appeared okay after that.

Some good news from IRC:

00:26 < Rene__> blogic: Thanks for the hwnat! Can it also be used on mt7621?
00:28 < blogic> rmilecki: and on its way upstrema
00:28 < blogic> Rene__: i will rebase it on ramips when its bumped to v5.4
00:28 < rmilecki> blogic: like patches sent?
00:29 < blogic> no but grabbed a job to send it upstream
00:29 < blogic> rmilecki: problem is that flow offload is part of the ethernet driver
00:29 < blogic> but with mt7622/9 we can also do wifi offload
00:29 < blogic> so we need an extra pdev to desacribe the offload engine
00:29 < blogic> and then the drivers can check if they share this
00:29 < rmilecki> oh, fun
00:30 < blogic> right now the hw need to check iof the in/egress device both are on the same hwnat
00:30 < blogic> gets more complicated when we do qca80 offload where it is only offloading inside DSA
00:30 < blogic> but I am on it ;)
00:30 < blogic> i am able to shift wirespeed with 256byte frames
00:31 < blogic> at 0% cpu load
00:31 < blogic> at 128byte frames it drops to ~780mbit
00:32 < Rene__> blogic: openwrt/master is bumped to v5.4 or are you waiting that it is in v19?
00:32 < blogic> Rene__: ramips is on v5.4 ?
00:32 < blogic> what is v19 ?
00:33 < Rene__> v19 = v19.0xx openwrt branch
00:33 < gch981213> blogic: I've pushed the patches for mt7621 last Saturday.
00:33 < blogic> gch981213: ah ok, let me bump the driver in that case
00:33 < blogic> is it on DSA ?
00:33 < gch981213> blogic: Yes.
00:34 < blogic> ok, let me go looking for a mt7621 unit
00:35 < gch981213> image for devices with w25q256 flash can't reboot atm. I'm working on it.
00:35 < Rene__> mt7621 has also a trick to wifi offload. What I understand is that they just push the wifi packed to packet engine.
5 Likes

Is there anywhere I can find this conversation in full? :slight_smile:

Idle on IRC like me :slight_smile:

Or just go here: http://logs.nslu2-linux.org/livelogs/openwrt-devel/openwrt-devel.20200407.txt

5 Likes

Thank you very much! It's a very useful link!

Netgear R6220 - tested on on OpenWRT 19.07.1 and OpenWRT 19.07.2 have problems with

  1. Memory leak in kernel
  2. mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
  3. interrupts errors
  4. Occasionally restarts

All problems gone (so far) after switch to the original firmware:

# cat /proc/interrupts
           CPU0       CPU1
  3:    4761112          0        MIPS GIC  eth2
  4:        188    7484398        MIPS GIC  rai0
  7:   15364666   15363980        MIPS GIC  timer
 12:          0          0        MIPS GIC  ralink_gpio
 18:          0          0        MIPS GIC  Ralink_SPDIF
 22:   61482632          0        MIPS GIC  xhci-hcd:usb1
 23:         12          0        MIPS GIC  Ralink_ESW
 25:        798    5997703        MIPS GIC  ra0
 26:     422129          0        MIPS GIC  serial
 56:     262949          0        MIPS GIC  IPI_resched
 57:          0     308253        MIPS GIC  IPI_resched
 58:          0          0        MIPS GIC  IPI_resched
 59:          0          0        MIPS GIC  IPI_resched
 60:    2333758          0        MIPS GIC  IPI_call
 61:          0     966205        MIPS GIC  IPI_call
 62:          0          0        MIPS GIC  IPI_call
 63:          0          0        MIPS GIC  IPI_call

ERR:          0
# free
              total         used         free       shared      buffers
  Mem:       122308        50656        71652            0         2764
 Swap:            0            0            0
Total:       122308        50656        71652
#

The original firmware is partially open - I added to it Dropbear and a few other things.

So, now I am sure this is 100% OpenWRT software problem - it's a sad conclusion.

Regards,
Samuel

In my case no memory leak, no transmit timed outs or interrupt errors (with fc off, interrupt handling and mt7530_fix patches). But yes, random restarts that can occur within 24 hours or 10 days. I am now testing a build without interrupt handling patch and with fc off on all ports.

https://github.com/openwrt/openwrt/pull/2815#issuecomment-602978547
https://github.com/openwrt/openwrt/pull/2847 (also with disable EEE).

Yep but it crash during boot up on R6220

@neheb

Out of curiosity, does Blogic have another public repo other than the one at git.openwrt.org, where it is possible to take a look at his HW offload work for 7621? If there is one, would be nice to take a look at :slight_smile:

He has a GitHub account which is fairly inactive. Nothing of the sort available anywhere though.

Well, with fc off on all ports + mt7530_fix its running for now Uptime 11d 12h 51m 37s but with some errors.

First occurred at 4 hours of uptime. I've never seen it before:

Tue Apr  7 12:54:50 2020 kern.alert kernel: [14076.077097] BUG: Bad page state in process swapper/0  pfn:0f22f
Tue Apr  7 12:54:50 2020 kern.emerg kernel: [14076.088918] page:811e75e0 count:0 mapcount:0 mapping:  (null) index:0x0
Tue Apr  7 12:54:50 2020 kern.emerg kernel: [14076.102114] flags: 0x0()
Tue Apr  7 12:54:50 2020 kern.alert kernel: [14076.107169] raw: 00000000 00000000 00000000 ffffffff 00000000 00000000 811e75f4 00000000
Tue Apr  7 12:54:50 2020 kern.alert kernel: [14076.123258] page dumped because: non-NULL mapping
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.132605] Modules linked in: pppoe ppp_async pppox ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.276369]  nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 tun nls_utf8 nls_iso8859_15 nls_cp852 nls_cp850 nls_cp437 nls_base leds_gpio gpio_button_hotplug
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.350330] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.167 #0
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.362445] Stack : 00000000 00000008 805a4380 8007265c 805a0000 80546510 00000000 00000000
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.379091]         80512100 8fc09c5c 80580d5c 805808e7 8050cef0 00000001 8fc09c00 53261646
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.395735]         00000000 00000000 806e0000 00004598 00000000 000000ee 00000008 00000000
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.412379]         00000000 80580000 0005587a 00000000 00000000 805a0000 00000000 80710000
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.429023]         805173f4 80580000 00000003 00000008 00000000 80299210 00000000 806e0000
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.445668]         ...
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.450530] Call Trace:
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.455416] [<8000c7b0>] show_stack+0x58/0x100
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.464279] [<8044f9d4>] dump_stack+0xa4/0xe0
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.472961] [<800cb6d4>] bad_page+0x110/0x148
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.481634] [<800ce464>] get_page_from_freelist+0x534/0x8e4
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.492722] [<800ceeb0>] __alloc_pages_nodemask+0x120/0xd0c
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.503810] [<800cfbe8>] page_frag_alloc+0x54/0x170
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.513535] [<803062d0>] fe_poll+0x340/0x800
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.522057] [<80349b2c>] net_rx_action+0x150/0x30c
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.531613] [<8046d870>] __do_softirq+0x128/0x2ec
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.540987] [<80033f84>] irq_exit+0xac/0xc8
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.549327] [<8024c1c0>] plat_irq_dispatch+0xfc/0x138
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.559377] [<80007588>] except_vec_vi_end+0xb8/0xc4
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.569253] [<80008f50>] r4k_wait_irqoff+0x1c/0x24
Tue Apr  7 12:54:50 2020 kern.warn kernel: [14076.578787] Disabling lock debugging due to kernel taint

Second at 4 days 12 hours:

Sat Apr 11 21:01:56 2020 kern.warn kernel: [388902.689722] ------------[ cut here ]------------
Sat Apr 11 21:01:56 2020 kern.warn kernel: [388902.699139] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
Sat Apr 11 21:01:56 2020 kern.info kernel: [388902.715789] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Sat Apr 11 21:01:56 2020 kern.warn kernel: [388902.729875] Modules linked in: pppoe ppp_async pppox ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache
Sat Apr 11 21:01:56 2020 kern.warn kernel: [388902.873962]  nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 tun nls_utf8 nls_iso8859_15 nls_cp852 nls_cp850 nls_cp437 nls_base leds_gpio gpio_button_hotplug
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388902.948252] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G    B           4.14.167 #0
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388902.962982] Stack : 00000000 8fea1740 80580000 8007265c 805a0000 80546510 00000000 00000000
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388902.979811]         80512100 8fc0ddc4 8fc3c99c 805808e7 8050cef0 00000001 8fc0dd68 53261646
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388902.996617]         00000000 00000000 806e0000 00005ca8 00000000 00000121 00000008 00000000
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.013418]         00000000 80580000 000e781c 20202020 00000000 805a0000 00000000 80540718
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.030221]         80370160 00000140 00000002 8fea1740 00000003 80299210 00000008 806e0008
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.047026]         ...
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.052058] Call Trace:
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.057113] [<8000c7b0>] show_stack+0x58/0x100
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.066145] [<8044f9d4>] dump_stack+0xa4/0xe0
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.074988] [<8002f5f8>] __warn+0xe0/0x138
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.083306] [<8002f680>] warn_slowpath_fmt+0x30/0x3c
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.093361] [<80370160>] dev_watchdog+0x1ac/0x324
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.102910] [<8008932c>] call_timer_fn.isra.25+0x24/0x84
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.113646] [<800895e8>] run_timer_softirq+0x1bc/0x248
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.124060] [<8046d870>] __do_softirq+0x128/0x2ec
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.133592] [<80033f84>] irq_exit+0xac/0xc8
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.142102] [<8024c1c0>] plat_irq_dispatch+0xfc/0x138
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.152319] [<80007588>] except_vec_vi_end+0xb8/0xc4
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.162365] [<80008f50>] r4k_wait_irqoff+0x1c/0x24
Sat Apr 11 21:01:57 2020 kern.warn kernel: [388903.172213] ---[ end trace 3f1f5eb5e3775b79 ]---
Sat Apr 11 21:01:57 2020 kern.err kernel: [388903.181624] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Sat Apr 11 21:01:57 2020 kern.info kernel: [388903.194180] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Sat Apr 11 21:01:57 2020 kern.info kernel: [388903.206397] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ec80000, max=0, ctx=2588, dtx=2588, fdx=2587, next=2588
Sat Apr 11 21:01:57 2020 kern.info kernel: [388903.228287] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e010000, max=0, calc=3057, drx=3075
Sat Apr 11 21:01:57 2020 kern.info kernel: [388903.272721] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
Sat Apr 11 21:01:57 2020 kern.info kernel: [388903.293259] mtk_soc_eth 1e100000.ethernet: PPE started

Third at 7 days 20 hours:

Wed Apr 15 04:28:13 2020 kern.err kernel: [674879.413092] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Wed Apr 15 04:28:13 2020 kern.info kernel: [674879.425618] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Wed Apr 15 04:28:13 2020 kern.info kernel: [674879.437794] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0d2f0000, max=0, ctx=2632, dtx=2632, fdx=2631, next=2632
Wed Apr 15 04:28:13 2020 kern.info kernel: [674879.459710] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c0f0000, max=0, calc=1171, drx=1172
Wed Apr 15 04:28:13 2020 kern.info kernel: [674879.481384] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5960000c, 0x10c = 0x80818
Wed Apr 15 04:28:13 2020 kern.info kernel: [674879.504536] mtk_soc_eth 1e100000.ethernet: PPE started

And last was today:

Sat Apr 18 07:23:23 2020 kern.err kernel: [944589.146024] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Sat Apr 18 07:23:23 2020 kern.info kernel: [944589.158525] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Sat Apr 18 07:23:23 2020 kern.info kernel: [944589.170688] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e3b0000, max=0, ctx=2598, dtx=2598, fdx=2597, next=2598
Sat Apr 18 07:23:23 2020 kern.info kernel: [944589.192515] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c6e0000, max=0, calc=1132, drx=1133
Sat Apr 18 07:23:23 2020 kern.info kernel: [944589.214739] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
Sat Apr 18 07:23:23 2020 kern.info kernel: [944589.234975] mtk_soc_eth 1e100000.ethernet: PPE started

Has this issue been fixed completely in 5.4 kernel?

Now I am running yesterday's master branch on DIR-860L B1 without a den, if without offloading stuff. Even the 5G WiFi dropout sometimes is gone.

So far it's quite stable I'd say. Now start holding my breath for the hw offloading part.

Edit: The other things that I spotted is, most of the interrupt ERRs are from 5G Hz WiFi, since the increase is not that much on 2.4 G Hz usage only.

I am using mine on kernel 5.4.31 for 8 days, not a single error, drop or reboot. Some more patches will likely arrive, but so far it is much better than the 4.14 branch.

@apocalypse

Why are you not trying out the 5.4 branch? Most of us have reasonably good results with it.

So master builds work again for the dir-860l? If so, it's time to fire up my buildmachine again

Yes, it does work. :slight_smile:

1 Like