Ethernet cuts out on Linksys EA6350v3

trr · January 15, 2020, 2:11am

I have reset the router to defaults and only changed the WAN IPv6 prefix length and during the failures many error lines are in logs like posts above.

Enabling the Wifi seems to generate less errors.

2.4GHz only:

Wed Jan 15 01:40:36 2020 kern.alert kernel: [ 103.230457] Unable to handle kernel NULL pointer dereference at virtual address 00000004
Wed Jan 15 01:40:36 2020 kern.alert kernel: [ 103.230497] pgd = c0204000
Wed Jan 15 01:40:36 2020 kern.alert kernel: [ 103.237612] [00000004] *pgd=00000000
Wed Jan 15 01:40:36 2020 kern.emerg kernel: [ 103.240191] Internal error: Oops: 817 [#1] SMP ARM

5GHz only:

Wed Jan 15 01:43:58 2020 kern.warn kernel: [ 184.317800] backlog: Budget exhausted after napi rescheduled

With both 2.4GHz and 5GHz enabled it seems to fail so you can still connect to Luci over WiFi but with no errors in logs (like I experienced at parents).

WiFi status could be random and mean nothing but hoping the logs can help.

bill888 · January 15, 2020, 3:54am

So would deleting the WAN6 interface be a temporarily fix for this issue?

You may wish to raise a bug report with your findings.
https://bugs.openwrt.org/

Update: trr's bug report: [FS#2741]
https://bugs.openwrt.org/index.php?do=details&task_id=2741

trr · January 16, 2020, 12:00am

Should be OK, I'll bring the router back and try it out.

Reported the issue. Thanks!

pgwipeout · January 22, 2020, 2:45pm

Good Morning,

I've lost access to the EA6350v3 I was working with (though I am tempted to buy one).
Would you be able to flash the debug kernel and re-run the test?

Thanks!
Peter

lleachii · January 22, 2020, 4:51pm

Is this related to the OPs problem?

If this is a general issue with flashing the EA6350v3, I'd suggest making a new thread for wider viewing.

rainer · January 22, 2020, 5:19pm

I would like to add that I have these issues on 2 Fritz!Box 4040's and a Linksys EA8300. In my case I have to have IPv6 enabled, use the lan interface on my desktop and create some network traffic with YouTube or wget, this will drop out reproducible my complete network. As of the affected models with this issue it might be related to the Qualcomm IPQ40xx driver.

Here is my bug report: https://bugs.openwrt.org/index.php?do=details&task_id=2591

pgwipeout · January 22, 2020, 6:58pm

Completely, if you look at the kernel panic, it is missing all of the debugging symbols to trace what actually happened.
Unless you know where the debugging symbols can be located for that build, the only other option is to flash a debug kernel so the stack trace has access to the symbols.

Once we can trace the functions, we can find where things are breaking.
IPv6 only issues tend to be variable size problems.

trr · January 22, 2020, 10:48pm

I can certainly try flashing a debug kernel. I'm assuming I would have to build my own firmware? I've gotten to the "make menuconfig", what specific options should I select? Under "Global build settings" is "Collect kernel debug information", is this it? Thanks!

Update... I've flashed release build of 19.07 with just adding Luci and checking "Collect kernel debug information" in menuconfig

root@OpenWrt:~# logread -f
Wed Jan 22 20:26:55 2020 kern.alert kernel: [  118.339257] BUG: Bad page state in process swapper/0  pfn:8d44a
Wed Jan 22 20:26:55 2020 kern.emerg kernel: [  118.339305] page:cffa1940 count:-1 mapcount:0 mapping:  (null) index:0x0
Wed Jan 22 20:26:55 2020 kern.emerg kernel: [  118.344006] flags: 0x0()
Wed Jan 22 20:26:55 2020 kern.alert kernel: [  118.350960] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000000 cffa1954 00000000
Wed Jan 22 20:26:55 2020 kern.alert kernel: [  118.353453] page dumped because: nonzero _count
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.361521] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables hwmon crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple gpio_button_hotplug
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.409867] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.162 #0
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.432094] Hardware name: Generic DT based system
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.438200] [<c030e3a8>] (unwind_backtrace) from [<c030a8a0>] (show_stack+0x10/0x14)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.442876] [<c030a8a0>] (show_stack) from [<c071a454>] (dump_stack+0x94/0xa8)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.450773] [<c071a454>] (dump_stack) from [<c03a4988>] (bad_page+0x11c/0x138)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.457802] [<c03a4988>] (bad_page) from [<c03a6e04>] (get_page_from_freelist+0x4d4/0x800)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.465008] [<c03a6e04>] (get_page_from_freelist) from [<c03a76dc>] (__alloc_pages_nodemask+0x10c/0xd0c)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.473255] [<c03a76dc>] (__alloc_pages_nodemask) from [<c03a8378>] (page_frag_alloc+0x3c/0x140)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.482893] [<c03a8378>] (page_frag_alloc) from [<c0619e64>] (__netdev_alloc_skb+0x8c/0xfc)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.491658] [<c0619e64>] (__netdev_alloc_skb) from [<c0619ee0>] (__netdev_alloc_skb_ip_align+0xc/0x30)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.499732] [<c0619ee0>] (__netdev_alloc_skb_ip_align) from [<c05c0b64>] (edma_alloc_rx_buf+0xa8/0x414)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.509109] [<c05c0b64>] (edma_alloc_rx_buf) from [<c05c3d0c>] (edma_poll+0xcbc/0xe4c)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.518397] [<c05c3d0c>] (edma_poll) from [<c062dc60>] (net_rx_action+0x138/0x2fc)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.526378] [<c062dc60>] (net_rx_action) from [<c0301520>] (__do_softirq+0xe0/0x240)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.533930] [<c0301520>] (__do_softirq) from [<c0321f3c>] (irq_exit+0xd4/0x138)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.541830] [<c0321f3c>] (irq_exit) from [<c035b278>] (__handle_domain_irq+0x9c/0xac)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.548860] [<c035b278>] (__handle_domain_irq) from [<c030140c>] (gic_handle_irq+0x5c/0x90)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.556847] [<c030140c>] (gic_handle_irq) from [<c030b40c>] (__irq_svc+0x6c/0x90)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.564999] Exception stack(0xc0a01f40 to 0xc0a01f88)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.572642] 1f40: 00000001 00000000 00000000 c0313a60 ffffe000 c0a03cb8 c0a03c6c 00000000
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.577680] 1f60: 00000000 00000001 cfffcd40 c092ca28 c0a01f88 c0a01f90 c0307e48 c0307e4c
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.585833] 1f80: 60000013 ffffffff
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.593993] [<c030b40c>] (__irq_svc) from [<c0307e4c>] (arch_cpu_idle+0x34/0x38)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.597301] [<c0307e4c>] (arch_cpu_idle) from [<c0351fc8>] (do_idle+0xdc/0x1a0)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.604938] [<c0351fc8>] (do_idle) from [<c03522e8>] (cpu_startup_entry+0x18/0x1c)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.611972] [<c03522e8>] (cpu_startup_entry) from [<c0900c80>] (start_kernel+0x3b8/0x3c4)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.619603] Disabling lock debugging due to kernel taint

pgwipeout · January 23, 2020, 2:45am

Beautiful!
I'd edit your bug report to add this in there.
I gotta dig into the source code, but just from a cursory glance I'd venture we aren't requesting enough space to receive an ipv6 packet. Since this would trigger an unhandled exception, the kernel would stop processing all data for that driver (in an abrupt manner).

trr · January 23, 2020, 2:56am

Awesome, I wish I had the knowledge to understand this Let me know if I can do anything else to assist.

trr · January 24, 2020, 1:38am

I've attached two more logs to bug report FS#2741. These caused the router to reboot during the speedtest.

LongRangeSkeet · February 7, 2020, 11:34pm

I believe that I've also encountered the exact same bug as OP on the next day after I freshly installed 19.07.1 over the stock Linksys firmware.

I promptly reverted the firmware back because I just can't have the primary router malfunction like this.

I'm curious if this will be fixed in the next update or something because I would love to give OpenWrt another try since Linksys seems to no longer support this particular router model anymore.

pgwipeout · February 11, 2020, 3:36pm

I did do a quick look at this, and it's not nearly as simple as I originally thought.
Apparently there was a change in the way memory is assigned to the tx and rx chains, which required drivers to rework their dma, sometime around 4.5.
A lot of drivers started seeing bugs like this, where tx chain misconfiguration would corrupt the rx chain.

The end fix appears that we need to move over to the new driver, but I can't work on that without buying one of these.
If one goes on sale I think I'm going to pick one up to do just that.

Otherwise it's up to the actual maintainer.

rainer · March 15, 2020, 11:43am

OpenWrt 19.07.2 still shows the same issue and got stuck now multiple times during normal operation. Did anybody progress with this? If this is a driver issue, is there some hope with newer Linux kernel's?

sammo · April 6, 2020, 9:23pm

I'm running 19.07.2 on Linksys ea6350 with IPv6 disabled on WAN. A pc with dhcp server connected to WAN and another pc connected to LAN. I then ran iperf3 both directions and cannot provoke a panic/disconnect. What I have notice is the byte count gets reset at 4GB. LAN interface goes beyond 4GB. Iperf3 was also ran WAN to WiFi , again cannot provoke panic/disconnect

Pie-jacker875 · April 8, 2020, 10:59pm

I'm interested in sending a couple bucks your way to help purchase one

trr · April 11, 2020, 8:54pm

I'd also gladly contribute some money to pgwipeout for a fix.

I just reran the IPv6 iperf3 test with the latest OpenWrt SNAPSHOT r12903 (stock from website so without kernel debug) with the same results, so the new kernel does not appear to help.

Does the maintainer know about this problem? How would we communicate?

elmystico · May 19, 2020, 8:16am

Guys
Linksys EA8300
I've run into this problems few days ago after some software packages update by opkg.
I'm running IPv6 through 6in4 tunnel (he.net). but I believe my setup was somehow broken before update- after update those ethernet hangs begun to occur.
If somebody is working on this please check out dnsmasq influence on this to be sure.
Temporarily I'm disabling IPv6

RachelRB · May 31, 2020, 6:06pm

Linksys EA8300 PPPOE disconnects the Ethernet port.
The decision to put MTU 666

elmystico · June 2, 2020, 10:08am

So not only IPv6 but PPPoE shuts down ethernet if I understand you correctly.
Giving lower MTU helps? Any reason to put MTU 666? Why 666 and not something closer to 1500?