NBG6817: OpenWrt rebooting constantly

OK. What SW version are you using?

I build/ update to current master every week or two, currently with the mac80211/ backports updates reverted (due to problems with ath10k and ath10k-ct in WDS mode).

So i got time to go back and look at the problem again. My router still reboots. I have noticed that the problem occur everytime my Arch laptop have been connected to the router for some minutes. I dont have downloads running or something like that, that could generate a session overload of the router.

I activated syslog and directed it to a remote syslog server. Here are the log from a crash:
https://pastebin.com/a9E5NifT

Any clue? :slight_smile:

This i saw with one crash:

Oct 16 19:42:37 **OpenWrt** kernel: [ 322.896757] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.568274] Unable to handle kernel paging request at virtual address b57f31ff

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.568353] pgd = c0204000

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.574467] [b57f31ff] *pgd=00000000

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.577100] Internal error: Oops: 5 [#1] SMP ARM

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.580816] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.633755] sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug ext4 jbd2 mbcache crc32c_generic

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.656007] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.63 #0

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.668314] Hardware name: Generic DT based system

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.674305] task: dd43f200 task.stack: dd45e000

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.678904] pc : [<c067e9f8>] lr : [<c0684bc0>] psr: 80000113

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.683335] sp : dd45fb30 ip : 00000000 fp : 00000000

And this on the other crashes:


Oct 16 19:52:04 **OpenWrt** kernel: [ 115.708559] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:10 **OpenWrt** kernel: [ 120.950383] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:15 **OpenWrt** kernel: [ 126.202088] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:20 **OpenWrt** kernel: [ 131.450527] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:25 **OpenWrt** kernel: [ 136.713289] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Anyone know what to do/try?

Disable jumbo frames in the problematic laptop...

Looks like interface is seeing larger packets than it supports. So make sure that the laptop has a suitable mtu.

1 Like

Since it's an Acer it's almost certain it's a Realtek card which may do strange things on its own :slight_smile:

@diizzy Its not an Acer laptop. Its a Lenovo T480s with:

3d:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)

@hnyman

I just looked at my laptop. Seems like MTU is setup correctly, but I must admit that your guess is likely true.

$ ip link show | grep mtu   
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
4: wlp61s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
7: wwp0s20f0u6: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

Its just wierd that this suddenly happens after i upgrade to 18.06.1 if it was the switch in the router that didnt support jumbo frames. I have run with OpenWRT for a year with older versions without the reboots happening.

It could also be an Arch update on my laptop that is causing it, but I cant find anything thats related to jumbo frames. :-/

Well, the reboots also happen when an Mac OSX laptop is logged on as the only wifi client, so this is propably not laptop problem.

Anything else to try? :slight_smile:

Sorry, misread Arch as Acer :wink:
eth0 suggests that it's seeing larger packages over cable not wireless

Hi there,

I also just experienced a crash after 12 days uptime on my NBG6817 (OpenWRT 18.06.1). Unfortunately, I couldn't read any useful logs, as they are setup to be in /tmp by default and therefore not available on next boot, but I'll setup a syslog server for the next time.

I want to add, that I experienced more frequent instability issues in the past, when I was using LEDE 17.x. Also included WiFi stability issue, but also random reboots. I reverted to stock Zyxel firmware, which ran fine with months of uptime and absolutely no issues. So I'd opt out any hardware issues.

Other than stock, I have Software Flow Offload enabled, SQM, UPnP and DDNS installed. This device is facing WAN and doing PPPoE. Temps are somewhere around 60 - 70°C.

I'll report back, when I have something in the syslogs (may take a while).

My netgear r7800 gets reboots,too.And I don't know what is the problem.

Sadly, I didnt solve my problem with the reboots. Does R7800 have the same chipset or why do you use this thread? :slight_smile:

I think my cause was my Sonos bridge sending jumbo frames that made the router reboots, due to some unknown cause.

ipq806x-gmac-dwmac 37400000.ethernet eth1: len 1629 larger than size (1536)
Because I also got this, len xxxx larger then size, I searched it on Google, only this page.

The r7800 and nbg6817 are very similar, apart from a few device specific issues (NAND vs. eMMC, no eSATA on the nbg6817), issues present in one of them are very likely to plague the other as well.

As there are no more reports about that error, it is likely something pretty rare caused by the combination of chipset, network config and the surrounding network devices.

Just for reference, the error comes from Linux sources:
https://elixir.bootlin.com/linux/v4.14.83/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L3390

And I am not sure if that is the ultimate reason for your crashes, as the kernel is supposed to drop those oversized packets.

did you find a solution to fix this?

No I did not. After two afternoons debugging my wife got mad, so i switched the router with an FritzBox 7590.

What kind of equipment is in your ethernet switch hingbong? Have you located the unit sending jumbo frames?

We get very 30 minutes the same error on eth0 on the R7800 (18.06.1):

kern.err kernel: [ 584.144343] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1994 larger than size

After about ten or so of these messages the router crashes and sometimes it does becomes totally unresponsive.
Now a crontab runs every 10 minutes and when it greps the nefarious message, it reboots the router thereby assuring access stays possible.

As the R7800 is connected on eth0 (WAN) to a cable modem (running in bridge mode, no DHCP) and the modem does not offer a way to disable jumbo frames, we do have a serious problem here.

The same thing is happening to me. I'm using Netgear R7800.
My latest error is

ipq806x-gmac-dwmac 37400000.ethernet eth1: len 1926 larger than size (1536)

and its the same one, keeps happing every 5 mins or so.

EDIT: Restoring the router seems to stop it