Ethernet cuts out on Linksys EA6350v3

Completely, if you look at the kernel panic, it is missing all of the debugging symbols to trace what actually happened.
Unless you know where the debugging symbols can be located for that build, the only other option is to flash a debug kernel so the stack trace has access to the symbols.

Once we can trace the functions, we can find where things are breaking.
IPv6 only issues tend to be variable size problems.

1 Like

I can certainly try flashing a debug kernel. I'm assuming I would have to build my own firmware? I've gotten to the "make menuconfig", what specific options should I select? Under "Global build settings" is "Collect kernel debug information", is this it? Thanks!

Update... I've flashed release build of 19.07 with just adding Luci and checking "Collect kernel debug information" in menuconfig

root@OpenWrt:~# logread -f
Wed Jan 22 20:26:55 2020 kern.alert kernel: [  118.339257] BUG: Bad page state in process swapper/0  pfn:8d44a
Wed Jan 22 20:26:55 2020 kern.emerg kernel: [  118.339305] page:cffa1940 count:-1 mapcount:0 mapping:  (null) index:0x0
Wed Jan 22 20:26:55 2020 kern.emerg kernel: [  118.344006] flags: 0x0()
Wed Jan 22 20:26:55 2020 kern.alert kernel: [  118.350960] raw: 00000000 00000000 00000000 ffffffff ffffffff 00000000 cffa1954 00000000
Wed Jan 22 20:26:55 2020 kern.alert kernel: [  118.353453] page dumped because: nonzero _count
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.361521] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables hwmon crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple gpio_button_hotplug
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.409867] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.162 #0
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.432094] Hardware name: Generic DT based system
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.438200] [<c030e3a8>] (unwind_backtrace) from [<c030a8a0>] (show_stack+0x10/0x14)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.442876] [<c030a8a0>] (show_stack) from [<c071a454>] (dump_stack+0x94/0xa8)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.450773] [<c071a454>] (dump_stack) from [<c03a4988>] (bad_page+0x11c/0x138)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.457802] [<c03a4988>] (bad_page) from [<c03a6e04>] (get_page_from_freelist+0x4d4/0x800)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.465008] [<c03a6e04>] (get_page_from_freelist) from [<c03a76dc>] (__alloc_pages_nodemask+0x10c/0xd0c)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.473255] [<c03a76dc>] (__alloc_pages_nodemask) from [<c03a8378>] (page_frag_alloc+0x3c/0x140)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.482893] [<c03a8378>] (page_frag_alloc) from [<c0619e64>] (__netdev_alloc_skb+0x8c/0xfc)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.491658] [<c0619e64>] (__netdev_alloc_skb) from [<c0619ee0>] (__netdev_alloc_skb_ip_align+0xc/0x30)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.499732] [<c0619ee0>] (__netdev_alloc_skb_ip_align) from [<c05c0b64>] (edma_alloc_rx_buf+0xa8/0x414)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.509109] [<c05c0b64>] (edma_alloc_rx_buf) from [<c05c3d0c>] (edma_poll+0xcbc/0xe4c)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.518397] [<c05c3d0c>] (edma_poll) from [<c062dc60>] (net_rx_action+0x138/0x2fc)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.526378] [<c062dc60>] (net_rx_action) from [<c0301520>] (__do_softirq+0xe0/0x240)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.533930] [<c0301520>] (__do_softirq) from [<c0321f3c>] (irq_exit+0xd4/0x138)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.541830] [<c0321f3c>] (irq_exit) from [<c035b278>] (__handle_domain_irq+0x9c/0xac)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.548860] [<c035b278>] (__handle_domain_irq) from [<c030140c>] (gic_handle_irq+0x5c/0x90)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.556847] [<c030140c>] (gic_handle_irq) from [<c030b40c>] (__irq_svc+0x6c/0x90)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.564999] Exception stack(0xc0a01f40 to 0xc0a01f88)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.572642] 1f40: 00000001 00000000 00000000 c0313a60 ffffe000 c0a03cb8 c0a03c6c 00000000
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.577680] 1f60: 00000000 00000001 cfffcd40 c092ca28 c0a01f88 c0a01f90 c0307e48 c0307e4c
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.585833] 1f80: 60000013 ffffffff
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.593993] [<c030b40c>] (__irq_svc) from [<c0307e4c>] (arch_cpu_idle+0x34/0x38)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.597301] [<c0307e4c>] (arch_cpu_idle) from [<c0351fc8>] (do_idle+0xdc/0x1a0)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.604938] [<c0351fc8>] (do_idle) from [<c03522e8>] (cpu_startup_entry+0x18/0x1c)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.611972] [<c03522e8>] (cpu_startup_entry) from [<c0900c80>] (start_kernel+0x3b8/0x3c4)
Wed Jan 22 20:26:55 2020 kern.warn kernel: [  118.619603] Disabling lock debugging due to kernel taint
2 Likes

Beautiful!
I'd edit your bug report to add this in there.
I gotta dig into the source code, but just from a cursory glance I'd venture we aren't requesting enough space to receive an ipv6 packet. Since this would trigger an unhandled exception, the kernel would stop processing all data for that driver (in an abrupt manner).

1 Like

Awesome, I wish I had the knowledge to understand this :slight_smile: Let me know if I can do anything else to assist.

I've attached two more logs to bug report FS#2741. These caused the router to reboot during the speedtest.

I believe that I've also encountered the exact same bug as OP on the next day after I freshly installed 19.07.1 over the stock Linksys firmware.

I promptly reverted the firmware back because I just can't have the primary router malfunction like this.

I'm curious if this will be fixed in the next update or something because I would love to give OpenWrt another try since Linksys seems to no longer support this particular router model anymore.

I did do a quick look at this, and it's not nearly as simple as I originally thought.
Apparently there was a change in the way memory is assigned to the tx and rx chains, which required drivers to rework their dma, sometime around 4.5.
A lot of drivers started seeing bugs like this, where tx chain misconfiguration would corrupt the rx chain.

The end fix appears that we need to move over to the new driver, but I can't work on that without buying one of these.
If one goes on sale I think I'm going to pick one up to do just that.

Otherwise it's up to the actual maintainer.

1 Like

OpenWrt 19.07.2 still shows the same issue and got stuck now multiple times during normal operation. Did anybody progress with this? If this is a driver issue, is there some hope with newer Linux kernel's?

I'm running 19.07.2 on Linksys ea6350 with IPv6 disabled on WAN. A pc with dhcp server connected to WAN and another pc connected to LAN. I then ran iperf3 both directions and cannot provoke a panic/disconnect. What I have notice is the byte count gets reset at 4GB. LAN interface goes beyond 4GB. Iperf3 was also ran WAN to WiFi , again cannot provoke panic/disconnect

I'm interested in sending a couple bucks your way to help purchase one

I'd also gladly contribute some money to pgwipeout for a fix.

I just reran the IPv6 iperf3 test with the latest OpenWrt SNAPSHOT r12903 (stock from website so without kernel debug) with the same results, so the new kernel does not appear to help.

Does the maintainer know about this problem? How would we communicate?

1 Like

Guys
Linksys EA8300
I've run into this problems few days ago after some software packages update by opkg.
I'm running IPv6 through 6in4 tunnel (he.net). but I believe my setup was somehow broken before update- after update those ethernet hangs begun to occur.
If somebody is working on this please check out dnsmasq influence on this to be sure.
Temporarily I'm disabling IPv6

Linksys EA8300 PPPOE disconnects the Ethernet port.
The decision to put MTU 666

So not only IPv6 but PPPoE shuts down ethernet if I understand you correctly.
Giving lower MTU helps? Any reason to put MTU 666? Why 666 and not something closer to 1500?

2 Likes

yeah thats kinda obvious

1 Like

I think I can confirm this with an ZyXEL NBG6617 running OpenWrt 19.07.3. I have an IPv6 uplink (WAN6) and apparently when I do a lot of IPv6 traffic, the switch stops working, but the router is still reachable via WiFi. However, I see nothing in logread or dmesg that looks related to the problem.

By the way, I am using additional VLAN tags in my internal LAN.

So is there any other solution than disabling IPv6 traffic on the Switch for a while?

Hold on...

  • that MTU is lower than the minimum for IPv6 (1280); and
  • to be clear, on a 1500 Ethernet link, the MTU for a HE tunnel would be 1480 (i.e. subtract 20 for the header)

Good news! This issue looks to have been fixed in this commit for all IPQ40xx devices:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=678569505623e50bbbbc344c7e820fb315b79ede

I have tested with SNAPSHOT r13684-3b0f698760 and no lock ups while using IPv6

I left the iperf3 IPv6 test running for 30 min with about 75Mbytes/s (one core is 100% IRQ). After installing IRQ Balance getting pretty consistent 110Mbytes/s after it juggles around the IRQ's.

1 Like

How would one go about adding this fix to a router running latest firmware? Sorry, I’m not familiar with compiling firmware.

  • If you're running the latest firmware, this means it would be fixed, correct?
  • To be exact, please tell us what version you're referencing.