NBG6817: OpenWrt rebooting constantly

dafb · October 11, 2018, 8:19pm

Hi guys

I have upgraded my ZyXEL NBG6817 with:

OpenWrt 18.06.1 r7258-5eb055306f / LuCI openwrt-18.06 branch (git-18.284.42397-55ebe88)

I have checked that all packages are upgraded:

root@OpenWrt:~# opkg list-upgradable
root@OpenWrt:~#

But my router just reboots spontaneously and often. It can run for an hour without problem, then the whole system reboot. Most of the times the router reboots every 10-15 minutes, sometimes with 5 minutes interval. Here is my logread:
https://pastebin.com/7BMsZfvr

Anyone know what to do and try. I have spent hours to figure it out

diizzy · October 11, 2018, 8:35pm

I'd say your best bet would probably be a snapshot of master before the ath10-ct switch
@slh has that specific device so maybe he can fill in some more.

slh · October 11, 2018, 8:35pm

I can't reproduce that behaviour, but I'm using current master snapshots and not 18.06.x (although that shouldn't make much of a difference here).

One thing to test, in case you have enabled flow-offloading, it might be interesting to test with it being disabled (it has been implicated in similar issues in the past, not just for ipq806x).

dafb · October 11, 2018, 9:03pm

OK. I can see that there are no check mark in "Software flow offloading", so thats already disabled.

Are there any usefull logfiles somewhere to check?

dafb · October 11, 2018, 9:04pm

Thanks Diizzy I will try and downgrade if I dont find a solution. The router have been running on 17.xx.x without problems earlier.

slh · October 11, 2018, 9:10pm

You can keep an eye on cat /sys/class/thermal/thermal_zone*/temp (ipq806x, like any highend ARM routers, tend to run hot) or you could set up remote syslogging in the hope to catch why it fails (might help, if not you may need a serial console).

Or you can try more recent master snapshots (ath10k-ct isn't working that well for QCA9984 so far though).

dafb · October 11, 2018, 9:19pm

@slh, thanks. Just while i was typing this reply it have rebootet 5-6 times.

root@OpenWrt:~# cat /sys/class/thermal/thermal_zone0/temp
74688

It dont feel hot from the outside though. I can lay a hand on the chassis and dont feel any heat in particular.

I will try the syslog thing and report back after the weekend

slh · October 11, 2018, 9:23pm

Mine is running around 85°C all the time (without any issues), so yours is cold in comparison.

dafb · October 11, 2018, 9:25pm

OK. What SW version are you using?

slh · October 11, 2018, 11:20pm

I build/ update to current master every week or two, currently with the mac80211/ backports updates reverted (due to problems with ath10k and ath10k-ct in WDS mode).

dafb · October 16, 2018, 5:07pm

So i got time to go back and look at the problem again. My router still reboots. I have noticed that the problem occur everytime my Arch laptop have been connected to the router for some minutes. I dont have downloads running or something like that, that could generate a session overload of the router.

I activated syslog and directed it to a remote syslog server. Here are the log from a crash:
https://pastebin.com/a9E5NifT

Any clue?

dafb · October 16, 2018, 5:56pm

This i saw with one crash:

Oct 16 19:42:37 **OpenWrt** kernel: [ 322.896757] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.568274] Unable to handle kernel paging request at virtual address b57f31ff

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.568353] pgd = c0204000

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.574467] [b57f31ff] *pgd=00000000

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.577100] Internal error: Oops: 5 [#1] SMP ARM

Oct 16 19:42:39 **OpenWrt** kernel: [ 324.580816] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.633755] sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug ext4 jbd2 mbcache crc32c_generic

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.656007] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.63 #0

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.668314] Hardware name: Generic DT based system

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.674305] task: dd43f200 task.stack: dd45e000

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.678904] pc : [&lt;c067e9f8&gt;] lr : [&lt;c0684bc0&gt;] psr: 80000113

Oct 16 19:42:40 **OpenWrt** kernel: [ 324.683335] sp : dd45fb30 ip : 00000000 fp : 00000000

And this on the other crashes:


Oct 16 19:52:04 **OpenWrt** kernel: [ 115.708559] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:10 **OpenWrt** kernel: [ 120.950383] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:15 **OpenWrt** kernel: [ 126.202088] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:20 **OpenWrt** kernel: [ 131.450527] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

Oct 16 19:52:25 **OpenWrt** kernel: [ 136.713289] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1675 larger than size (1536)

dafb · October 18, 2018, 12:44pm

Anyone know what to do/try?

hnyman · October 18, 2018, 5:10pm

Disable jumbo frames in the problematic laptop...

Looks like interface is seeing larger packets than it supports. So make sure that the laptop has a suitable mtu.

diizzy · October 18, 2018, 6:58pm

Since it's an Acer it's almost certain it's a Realtek card which may do strange things on its own

dafb · October 21, 2018, 7:46pm

@diizzy Its not an Acer laptop. Its a Lenovo T480s with:

3d:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)

@hnyman

I just looked at my laptop. Seems like MTU is setup correctly, but I must admit that your guess is likely true.

$ ip link show | grep mtu   
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DEFAULT group default qlen 1000
4: wlp61s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
7: wwp0s20f0u6: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000

Its just wierd that this suddenly happens after i upgrade to 18.06.1 if it was the switch in the router that didnt support jumbo frames. I have run with OpenWRT for a year with older versions without the reboots happening.

It could also be an Arch update on my laptop that is causing it, but I cant find anything thats related to jumbo frames. :-/

dafb · October 21, 2018, 8:06pm

Well, the reboots also happen when an Mac OSX laptop is logged on as the only wifi client, so this is propably not laptop problem.

Anything else to try?

diizzy · October 21, 2018, 9:03pm

Sorry, misread Arch as Acer
eth0 suggests that it's seeing larger packages over cable not wireless

tolga9009 · October 30, 2018, 12:22am

Hi there,

I also just experienced a crash after 12 days uptime on my NBG6817 (OpenWRT 18.06.1). Unfortunately, I couldn't read any useful logs, as they are setup to be in /tmp by default and therefore not available on next boot, but I'll setup a syslog server for the next time.

I want to add, that I experienced more frequent instability issues in the past, when I was using LEDE 17.x. Also included WiFi stability issue, but also random reboots. I reverted to stock Zyxel firmware, which ran fine with months of uptime and absolutely no issues. So I'd opt out any hardware issues.

Other than stock, I have Software Flow Offload enabled, SQM, UPnP and DDNS installed. This device is facing WAN and doing PPPoE. Temps are somewhere around 60 - 70°C.

I'll report back, when I have something in the syslogs (may take a while).

hingbong · November 23, 2018, 2:38pm

My netgear r7800 gets reboots,too.And I don't know what is the problem.