Build for Netgear R7800

I think this was an issue in master a couple weeks ago. A newer build should fix it.

I just updated to master-r13611-3f27a6e640-20200622 and the problem persists. Just to be thorough, I did change my DHCP range to not overlap the static leases, and that did not resolve it either.

Very consistent though - on every reboot the second SSID with dynamic DHCP disabled (although this may have nothing to do with it) does not give out static leases until dnsmasq is restarted.

If there is any information I can supply to help diagnose the issue I'm more than happy to help. Other than the single message I shared showing up for every device that fails - I haven't found any other errors.

okay - I did find something in the kernel log, but don't have how relavent it is since I don't understand it much... but it lists 15 peers, and that is how many devices were connecting to the SSID with the problem:

[   89.810162] ath10k_pci 0001:01:00.0: Invalid peer id 4 or peer stats buffer, peer: 00000000  sta: 00000000
[  134.603196] ath10k_pci 0000:01:00.0: htt tx: fixing invalid VHT TX rate code 0xff
[  155.036645] ath10k_pci 0000:01:00.0: Invalid VHT mcs 15 peer stats

maybe that says something useful?
For the 2.4 GHz I have the wireless set to N, channel 1, 40 MHz channel width.
Is the VHT message just about it changing the width to 20 instead of 40?
From the timestamps I expect that was around the time I restarted dnsmasq.

Thanks,
DeadEnd

Sounds like something in wifi & DHCP config and the hotplug actions, nothing specifically related to my build. You might get answers from wider audience, if you would open an own thread about the it.

My guess would be that when dnsmasq originally starts, if does not yet see the wifi interface up, and fails to include that in the run-time config.

2 Likes

That makes perfect sense.
I'll split off into a new thread - Thanks!

Update:
Ends up the interface for the second SSID did not have force link flagged.
I must have missed this during my configuration... user error :slight_smile: .

Long time listener, first time caller:

I'm using standard snapshot build and not this one, but this thread has the largest collection of R7800 users - I'm experiencing occasional oops within the ath10k stack.

But first, I'm offering up my sysfs.d local file that might be helpful for people using any R7800 (unrelated to the ath10k crash):

/etc/sysfs.d/local:

# local sysctl settings can be stored in this directory

devices/system/cpu/cpufreq/policy0/scaling_governor = performance
devices/system/cpu/cpufreq/policy1/scaling_governor = performance
devices/system/cpu/cpufreq/policy0/scaling_max_freq = 1725000
devices/system/cpu/cpufreq/policy1/scaling_min_freq = 800000
devices/system/cpu/cpufreq/ondemand/up_threshold = 75
devices/system/cpu/cpufreq/ondemand/sampling_down_factor = 10

devices/virtual/net/br-lan/queues/rx-0/rps_cpus = 3
devices/virtual/net/eth0.2/queues/rx-0/rps_cpus = 3
devices/virtual/net/eth1.1/queues/rx-0/rps_cpus = 3
devices/virtual/net/ifb4eth0.2/queues/rx-0/rps_cpus = 3
devices/virtual/net/lo/queues/rx-0/rps_cpus = 3

devices/platform/soc/29000000.sata/ata1/host0/scsi_host/host0/link_power_management_policy = 'min_power'

Crash:

[354147.746549] ------------[ cut here ]------------
[354147.746661] WARNING: CPU: 0 PID: 9 at backports-5.7-rc3-1/net/mac80211/sta_info.c:1929 ieee80211_sta_update_pending_airtime+0x1f8/0x1fc [mac80211]
[354147.750291] STA a4:d9:31:00:56:37 AC 2 txq pending airtime underflow: 4294966496, 800
[354147.750293] Modules linked in: xt_connlimit pppoe ppp_async nf_conncount iptable_nat ath10k_pci ath10k_core ath xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack_netlink nf_conntrack mac80211 ipt_REJECT ebtable_nat ebtable_filter ebtable_broute cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY wireguard slhc sch_cake nlmon nfnetlink_queue nfnetlink_log nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables ebtables ebt_vlan ebt_stp ebt_redirect ebt_pkttype ebt_mark_m ebt_mark ebt_limit ebt_among ebt_802_3 crc_ccitt compat sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit
[354147.763399]  act_mirred ledtrig_usbport ledtrig_heartbeat xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb ip6_udp_tunnel udp_tunnel netlink_diag leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom ohci_platform ohci_hcd phy_qcom_dwc3 ahci fsl_mph_dr_of ehci_platform ehci_fsl sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug
[354147.900989] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.46 #0
[354147.923141] Hardware name: Generic DT based system
[354147.929141] [<c030f954>] (unwind_backtrace) from [<c030b96c>] (show_stack+0x14/0x20)
[354147.933920] [<c030b96c>] (show_stack) from [<c08d8ac0>] (dump_stack+0x94/0xa8)
[354147.941907] [<c08d8ac0>] (dump_stack) from [<c031e7b4>] (__warn+0xb4/0xd0)
[354147.949017] [<c031e7b4>] (__warn) from [<c031e850>] (warn_slowpath_fmt+0x80/0x90)
[354147.956008] [<c031e850>] (warn_slowpath_fmt) from [<bf428af0>] (ieee80211_sta_update_pending_airtime+0x1f8/0x1fc [mac80211])
[354147.963718] [<bf428af0>] (ieee80211_sta_update_pending_airtime [mac80211]) from [<bf42313c>] (ieee80211_report_low_ack+0x254/0x50c [mac80211])
[354147.975047] [<bf42313c>] (ieee80211_report_low_ack [mac80211]) from [<bf423408>] (ieee80211_free_txskb+0x14/0x2c [mac80211])
[354147.987733] [<bf423408>] (ieee80211_free_txskb [mac80211]) from [<bf5678d8>] (ath10k_txrx_tx_unref+0x608/0x738 [ath10k_core])
[354147.999168] [<bf5678d8>] (ath10k_txrx_tx_unref [ath10k_core]) from [<bf56188c>] (ath10k_htt_t2h_msg_handler+0xe6c/0x1288 [ath10k_core])
[354148.010421] [<bf56188c>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf5bd7e4>] (ath10k_pci_htt_rx_cb+0x178/0x230 [ath10k_pci])
[354148.022845] [<bf5bd7e4>] (ath10k_pci_htt_rx_cb [ath10k_pci]) from [<bf581b44>] (ath10k_ce_per_engine_service+0x9c/0x10c [ath10k_core])
[354148.034741] [<bf581b44>] (ath10k_ce_per_engine_service [ath10k_core]) from [<bf581c34>] (ath10k_ce_per_engine_service_any+0x80/0xd8 [ath10k_core])
[354148.046791] [<bf581c34>] (ath10k_ce_per_engine_service_any [ath10k_core]) from [<bf5bf95c>] (ath10k_pci_napi_poll+0x54/0x15c [ath10k_pci])
[354148.059972] [<bf5bf95c>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c0779ff4>] (net_rx_action+0x118/0x374)
[354148.072461] [<c0779ff4>] (net_rx_action) from [<c0302298>] (__do_softirq+0x130/0x2d4)
[354148.082179] [<c0302298>] (__do_softirq) from [<c03228dc>] (run_ksoftirqd+0x38/0x4c)
[354148.090164] [<c03228dc>] (run_ksoftirqd) from [<c0341798>] (smpboot_thread_fn+0xfc/0x1c8)
[354148.098067] [<c0341798>] (smpboot_thread_fn) from [<c033e13c>] (kthread+0x160/0x164)
[354148.106135] [<c033e13c>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
[354148.114030] Exception stack(0xdd465fb0 to 0xdd465ff8)
[354148.121152] 5fa0:                                     00000000 00000000 00000000 00000000
[354148.126369] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[354148.134611] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[354148.142905] ---[ end trace 34d2bd0d6810903a ]---

[   15.370966] ath10k_pci 0000:01:00.0: assign IRQ: got 35
[   15.370992] ath10k 5.1 driver, optimized for CT firmware, probing pci device: 0x46.
[   15.371750] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
[   15.377527] ath10k_pci 0000:01:00.0: enabling bus mastering
[   15.378031] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   15.966252] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[   15.966285] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   15.976767] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fH-013-4ab470999 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 34b2045a
[   18.306064] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 85498734
[   24.146989] ath10k_pci 0000:01:00.0: unsupported HTC service id: 1536
[   24.148025] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[   24.152495] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[   24.235422] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[   24.236265] ath10k_pci 0000:01:00.0: wmi print 'free: 84872 iram: 13412 sram: 11224'
[   24.524857] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1

Build: OpenWrt SNAPSHOT r13600-9a477b833a / LuCI Master git-20.168.54087-84a0b68


My config is 98% identical to the config file used to make this build.

Should this post go elsewhere? I did want to share my sysfs.d with people here, though, too.

Yes.
Just to avoid this thread getting clogged with generic R7800 help requests, I initiated also Netgear R7800 exploration (IPQ8065, QCA9984) at the same time, so that generic R7800 discussion would not happen here.

(And naturally, you might start your own thread as you have a specific discussion topic.)

Ah thank you, I will post it over there.

Anyone else seeing these crashes? The trace in the bug is from the latest "old" build, though I think I've seen it with "ct" also.
https://bugs.openwrt.org/index.php?do=details&task_id=3204

Yeah I have them too occsionally. They were already present in backport 5.4 last year.

Surfing around trying to find a OpenWRT build that works for me or advice as to what I am doing wrong...
What I have seen is lots of Ath10k errors. I initially loaded the R7800 19.07.3 build and immediately had lots of Wifi issues: Disconnects, Inability to route to a device though it can get out to the internet just fine, Dropped and not reconnected, etc.

The 19.07.3 image for this router uses the Ath10k_pci_ct driver if I am not mistaken and had more problems when i had this loaded. I then loaded the snapshot of the official R7800 build (new kernel), but it uses the Ath10k_pci driver (no _ct). It has been the most stable so far as far as maintaining connections, but still had issues form time to time and mostly on the 5Ghz radio which is used most often.

Today, I loaded Hnyman's "SNAPSHOT r13625-d4dea7efcd / LuCI Master git-20.171.46309-c351bee" version. It has different problems than the normal snapshot or the 19.07.3 version. I am now getting terrible reliability on the 2.5Ghz radio (previously rock stable and the problems were apparent only on the 5Ghz radio). I now get this in the log:

[  726.020999] ath10k_pci 0001:01:00.0: Invalid peer id 2 peer stats buffer
[14173.484757] ath10k_pci 0000:01:00.0: received unexpected tx_fetch_ind event: in push mode
Lots of these over and over...

Do not know what that is. I am recent switch from DDWRT and am still not completely conversant in OpenWRT. Can provide logs and such, but is this an issue others are having or is it me? I have Android Phones, Laptops, Rpis using the 5Ghz radio and need the RPIs to be able to get to the internet and also be able to connect to them via Avahi and http. This seems to be the stumbling block. It is like the routing gets hosed and does not restart until I reboot it.
Should I try the hnyman 19.07.3 version? Maybe a Kong build? Are all builds experiencing similar issues? Do I need to go back to DDWRT?

Still getting these. This is from last night:

Tue Jun 30 20:03:41 2020 kern.warn kernel: [44129.933585] ath10k_pci 0000:01:00.0: received unexpected tx_fetch_ind event: in push mode
Tue Jun 30 20:03:41 2020 kern.warn kernel: [44129.933633] ath10k_pci 0000:01:00.0: received unexpected tx_fetch_ind event: in push mode
Tue Jun 30 20:03:41 2020 kern.warn kernel: [44129.940875] ath10k_pci 0000:01:00.0: received unexpected tx_fetch_ind event: in push mode
Tue Jun 30 20:03:41 2020 kern.warn kernel: [44129.948996] ath10k_pci 0000:01:00.0: received unexpected tx_fetch_ind event: in push mode

None since then. Everything seems OK for now and this snapshot build seems the most reliable of all so far that I have put on my R7800. This AM when I checked, all my mDNS stuff was available on all devices and was able to reach the servers on the LAN as well. Can anyone help with the diagnosis of this error? Is it a bad driver on one of my devices? Is it anything to worry about?

I get tons of those too, I think they are harmless.

Well, it finally did it again. No connection to some wifi devices (Rpi)

Thu Jul  2 11:28:49 2020 daemon.info hostapd: wlan0: STA bc:ff:eb:xx:xx:xx IEEE 802.11: disassociated due to inactivity
Thu Jul  2 11:28:50 2020 daemon.info hostapd: wlan0: STA bc:ff:eb:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.465756] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 0
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.465808] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 0
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.472006] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 0
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.479244] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 0
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.489783] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 1
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.493764] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 1
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.501164] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 1
Thu Jul  2 11:28:50 2020 kern.warn kernel: [186037.508422] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 22 tid 1

The MAC is my phone. I left the house and this seemed to happen at the same time. Maybe a coincidence? Now, only some of my mDNS devices show up and I cannot get to one of the Pis though it shows as having a current lease though it is no longer connected to wlan1 (2.5 Ghz). The DHCP page shows it as having an active unlimited lease though it is not connected to wlan1.
Rebooted the device and I can access it, but it will not show up on my avahi browser. When I reboot the router it will return, but I need to reboot it to restore functionality.

I get a pretty rare crash about once every two weeks running the latest master build. Been experiencing this for about six months now.

Are kernel panics logged anywhere before the system resets so I can hopefully report this to the openwrt devs?

Also, still absolutely in love with this build/router. I'll be sticking to this until Ubiquiti adds CAKE support officially, which might be a long time. It's that good.

I've had no crashes since I bumped the lowest freq it can go to 800 MHz there were previous posts here discussing it (and I seen in the luci monitoring logs the freq constantly jumping from 300ish to 800), atm I'm using various settings gathered from the thread and had no restart in last 2 weeks or so, for the record I only use 2,4 GHz wifi on the router (for my IoT devices) and the router is only connected to modem and a managed switch.

I got this in my Local Startup in luci:

echo ondemand > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
echo 1725000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1725000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq
echo 75 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
/usr/sbin/irqbalance

exit 0
1 Like

This morning, again, the PiZero is no longer connectable. This is in the kernel log

[286759.252859] ath10k_pci 0000:01:00.0: Invalid peer id 41 peer stats buffer
[286759.259184] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 41 tid 1

The 'Failed Lookup' error was repeated many times.

There's a new firmware file for "old"
https://github.com/kvalo/ath10k-firmware/blob/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00099

And also the 3.10 version that has been around for a while:
https://github.com/kvalo/ath10k-firmware/blob/master/QCA9984/hw1.0/3.10/firmware-5.bin_10.4-3.10-00047

The default in the "old" builds is 10.4-3.9.0.2-00070

Not sure what the policy is to bump it to a more recent one.

And also a new "ct" version .019
https://www.candelatech.com/downloads/ath10k-9984-10-4b/?C=M;O=D

Edit: Unfortunately for me .019 is worse than .018 very laggy and packet loss between 2 Macs.

Still having this issue. The 5Ghz seems more stable. It is now the 2.5Ghz that seems to have disconnects w/o reconnect.
@hugalafutro Have put your settings in the router Starte=up and rebooted. Will see if there is any difference.

After installing master-r13684-3b0f698760-20200704 and importing the SSL certificate into the Trusted Root Certification Authorities (Chrome), I still cannot https into luci without the "privacy" error.

Any ideas?

edit: sorry, the problem is that I cannot get rid of the annoying 'not secure' message regarding the certificate.