Stability issues on Netgear Nighthawk X4S R7800

Hi,

first of all let me send a BIG THANK YOU to all the wonderful people who contribute to OpenWrt (running the project, writing documentation, coding, lots more). Thanks for giving me the choice to leverage an open source alternative for such a crucial topic (networking = core communication)!

Unfortunately with the current choice of hardware (Netgear Nighthawk X4S R7800) I am running into stability issues. Anybody else facing similar messages in the logs?

Running a vanilla 19.07.0-rc2 installation downloaded at https://downloads.openwrt.org/releases/19.07.0-rc2/targets/ipq806x/generic/ btw.

Some hopefully useful information below. Not every time the kernel trace is being logged you see a loss of connectivity, but sometimes you do. Weird enough, I notice by getting a "destination host unreachable" from my master-router. So, not wifi is gone, but routing. Usually rebooting doesn't help, but a power cycle is needed, maybe to get some settings off the ath10k driver.

Reading the trace I noticed 19.07.0-rc2 seems to go with a 4.14 kernel but a backported driver of the 4.19 tree. Wouldn't it be easier to go full 4.19 kernel eliminating backporting issues?

Hopefully you'll be able to point me the right direction how to debug/solve.
have a great day, thx in advance!

root@w34master ~> dmesg|grep Machine.model
[    0.000000] OF: fdt: Machine model: Netgear Nighthawk X4S R7800
root@w34master ~> cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='19.07.0-rc2'
DISTRIB_REVISION='r10775-db8345d8e4'
DISTRIB_TARGET='ipq806x/generic'
DISTRIB_ARCH='arm_cortex-a15_neon-vfpv4'
DISTRIB_DESCRIPTION='OpenWrt 19.07.0-rc2 r10775-db8345d8e4'
DISTRIB_TAINTS=''
root@w34master ~> opkg list-installed|grep -e kernel -e ath
ath10k-firmware-qca9984-ct - 2019-10-03-d622d160-1
kernel - 4.14.156-1-0894164cab0effc42201a29fec8ce33f
kmod-ath - 4.14.156+4.19.85-1-1
kmod-ath10k-ct - 4.14.156+2019-09-09-5e8cd86f-1
root@w34master ~> logread|grep Jan..6|grep kern.warn.kernel
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.776451] ------------[ cut here ]------------
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.776513] WARNING: CPU: 0 PID: 2402 at backports-4.19.85-1/net/wireless/util.c:1147 0xbf1e5c98 [cfg80211@bf1e1000+0x37000]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.780315] invalid rate bw=0, mcs=15, nss=4
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.791434] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.844077]  sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.866280] CPU: 0 PID: 2402 Comm: kworker/u4:0 Not tainted 4.14.156 #0
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.875370] Hardware name: Generic DT based system
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.881991] Workqueue: phy0 0xbf24158c [mac80211@bf22d000+0x68000]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.886839] Function entered at [<c030f1c4>] from [<c030b390>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.892995] Function entered at [<c030b390>] from [<c07bf364>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.898811] Function entered at [<c07bf364>] from [<c031fa58>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.904627] Function entered at [<c031fa58>] from [<c031fab8>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.910441] Function entered at [<c031fab8>] from [<bf1e5c98>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.916295] Function entered at [<bf1e5c98>] from [<bf27e930>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.922090] Function entered at [<bf27e930>] from [<bf27e9dc>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.927892] Function entered at [<bf27e9dc>] from [<bf27f2f8>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.933706] Function entered at [<bf27f2f8>] from [<bf27ab68>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.939522] Function entered at [<bf27ab68>] from [<bf24181c>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.945340] Function entered at [<bf24181c>] from [<c03371e0>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.951153] Function entered at [<c03371e0>] from [<c03376dc>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.956969] Function entered at [<c03376dc>] from [<c033d32c>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.962784] Function entered at [<c033d32c>] from [<c0307c48>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.968698] ---[ end trace 14e53bb3327f7551 ]---
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.974501] ------------[ cut here ]------------
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.979292] WARNING: CPU: 0 PID: 2402 at backports-4.19.85-1/net/mac80211/mesh_hwmp.c:344 0xbf27e948 [mac80211@bf22d000+0x68000]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34040.983795] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.043790]  sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.066013] CPU: 0 PID: 2402 Comm: kworker/u4:0 Tainted: G        W       4.14.156 #0
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.075111] Hardware name: Generic DT based system
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.083114] Workqueue: phy0 0xbf24158c [mac80211@bf22d000+0x68000]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.087787] Function entered at [<c030f1c4>] from [<c030b390>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.093947] Function entered at [<c030b390>] from [<c07bf364>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.099763] Function entered at [<c07bf364>] from [<c031fa58>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.105579] Function entered at [<c031fa58>] from [<c031fb44>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.111395] Function entered at [<c031fb44>] from [<bf27e948>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.117213] Function entered at [<bf27e948>] from [<bf27e9dc>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.123030] Function entered at [<bf27e9dc>] from [<bf27f2f8>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.128845] Function entered at [<bf27f2f8>] from [<bf27ab68>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.134660] Function entered at [<bf27ab68>] from [<bf24181c>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.140476] Function entered at [<bf24181c>] from [<c03371e0>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.146292] Function entered at [<c03371e0>] from [<c03376dc>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.152106] Function entered at [<c03376dc>] from [<c033d32c>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.157924] Function entered at [<c033d32c>] from [<c0307c48>]
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.163845] ---[ end trace 14e53bb3327f7552 ]---
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.170944] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
Mon Jan  6 14:07:21 2020 kern.warn kernel: [34041.174354] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 1, skipped old beacon

Which stability issues do you got? You wrote a lot but don't exactly tell what symptoms you see.

Loss of connectivity from where to where? What exactly does not work? Has it worked before? What did you already try to resolve it?

Do you get more stability in your application when you switch to OpenWrt 18.06.5? You could also try switching to ath10k without -ct if your analysis shows a relation to the wifi components.

You might want to try Hnyman's master which is based on 4.19 kernel: Build for Netgear R7800
I don't have issue with that build so, assuming no hw issue nor anything specific in the config, should work fine for you as well.

Great, I read a lot so far but didn't see that one. Thanks, will give it a try (the upcoming weekend as the current setup is in "production" -> family).

Sorry and you are right. Wasn't very clear about the loss of connectivity ..

Well, there are a bunch of laptops and other mobile devices across the house. They all connect to one of the 3 APs (same hardware, same OpenWrt version, same SSID, same key, different channel on 2.4GHz). The APs are mesh'ed (5GHz) and bridge the interfaces. One of them routing from/to the internet. That's it. Pretty straight forward mesh setup. Freshly power cycled => works great.

Every once in a while the "master" (the one OpenWrt device routing from/to the internet) throws a kernel trace. Most the time you don't even notice (the router keeps doing it's job), but sometimes the kernel doesn't route any more. The clients (laptops, tablets, phones) can't browse the internet any more and get a "destination host unreachable".

Power cycling the master/main Netgear device brings functionality back.
Hope the description makes more sense now.

Will give both tips a try (Hnyman's image as well as replacing the driver version (without the -ct)).
Thanks

  • The Nighthawk has mutiple 2.4 GHz radios?
  • If not, how are these set on different channels?

Read again:

I think it was meant that each of the three access points uses a different 2.4 GHz channel for connection to the clients to separate these wireless networks. R7800 has two radios: one 2.4 GHz 11b/g/n and one 5.2 GHz 11a/n/ac.

1 Like

Oh, so the other 2 APs weren't really relevant in the story. The crashing router is the main issue...

  • Can you upgrade to 19.07.0 and see if the issue persists?
  • Have you tried ath10k driver (no -ct) yet @odrt suggested?
  • Have you tried the Hnyman firmware yet as @perceival noted?

Hi, I suddenly had almost the same crash here on an ZyXEL NBG6617 running OpenWrt 19.07.2, r10947-65030d81f3

[ 8967.346279] ------------[ cut here ]------------
[ 8967.346342] WARNING: CPU: 1 PID: 1775 at backports-4.19.98-1/net/wireless/util.c:1147 0xbf157824 [cfg80211@bf153000+0x33000]
[ 8967.349987] invalid rate bw=0, mcs=15, nss=4
[ 8967.361215] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables hwmon crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple gpio_button_hotplug
[ 8967.411714] CPU: 1 PID: 1775 Comm: hostapd Not tainted 4.14.171 #0
[ 8967.433222] Hardware name: Generic DT based system
[ 8967.439481] Function entered at [<c030e2a8>] from [<c030a7a8>]
[ 8967.444249] Function entered at [<c030a7a8>] from [<c073fe14>]
[ 8967.450181] Function entered at [<c073fe14>] from [<c031dd34>]
[ 8967.456974] Function entered at [<c031dd34>] from [<c031dd88>]
[ 8967.461812] Function entered at [<c031dd88>] from [<bf157824>]
[ 8967.467744] Function entered at [<bf157824>] from [<bf164184>]
[ 8967.473456] Function entered at [<bf164184>] from [<bf17057c>]
[ 8967.479269] Function entered at [<bf17057c>] from [<bf1710e0>]
[ 8967.485092] Function entered at [<bf1710e0>] from [<bf1a2318>]
[ 8967.490965] Function entered at [<bf1a2318>] from [<bf1a2394>]
[ 8967.496718] Function entered at [<bf1a2394>] from [<bf1a2404>]
[ 8967.502524] Function entered at [<bf1a2404>] from [<bf1603ec>]
[ 8967.508499] Function entered at [<bf1603ec>] from [<c0670758>]
[ 8967.514153] Function entered at [<c0670758>] from [<c066f198>]
[ 8967.519964] Function entered at [<c066f198>] from [<c066f838>]
[ 8967.525675] Function entered at [<c066f838>] from [<c066e9fc>]
[ 8967.531491] Function entered at [<c066e9fc>] from [<c066ee08>]
[ 8967.537305] Function entered at [<c066ee08>] from [<c061f5e0>]
[ 8967.543124] Function entered at [<c061f5e0>] from [<c061fd88>]
[ 8967.548937] Function entered at [<c061fd88>] from [<c03073c0>]
[ 8967.556080] ---[ end trace 6c21c83d5dce81bc ]---

The drivers configuration is as follows:

# opkg list | grep ath
ath10k-firmware-qca4019-ct - 2019-10-03-d622d160-1
kmod-ath - 4.14.171+4.19.98-1-1
kmod-ath10k-ct - 4.14.171+2019-09-09-5e8cd86f-1
# lsmod | grep ath
ath                    20480  1 ath10k_core
ath10k_core           335872  1 ath10k_pci
ath10k_pci             36864  0 
cfg80211              208896  3 ath10k_core,ath,mac80211
hwmon                  12288  1 ath10k_core
mac80211              401408  1 ath10k_core

I use multiple SSIDs on 2.GHz and 5GHz, but nothing like a wireless repeater is configured and it's the only AP here in my network.