Wireless instability on Netgear R7800 (19.07.2 migrated from 18.06.4)

I've been running 18.06.4 for quite a while and decided to migrate to the latest stable branch. I've read quite a lot about the wireless firmware changes (CT firmware among others) that took place and that seems to render this version less stable, based on general comments.

Unfortunately, it happened to me; quite unstable wireless performances reported by all family members after installing 19.07.2 - I had to go back to 18.06.4 because the helpdesk (e.g. ME) was overwhelmed by requests of malfunctioning devices... This was on a mixture of desktops, phones and tablets. About a dozen of different wireless peripherals are connected on my router, either on 2.4 GHz or 5 GHz.

I did a clean install, no settings preservation, basic reconfiguration and security settings. No applications were installed other than SQM for bufferbloat. Same wireless channels, same bandwidth and same security settings as before. No Routing/NAT/Software flow offloading, KRACK on, Disassociate On Low Acknowledgement on, but nothing really different from what was configured in 18.06.4.

I've tried to install the Non-CT drivers by replacing firmware-5.bin and board-2.bin from KValo (threads reference jerrytouille and dissent1). This did not help with stability.

I do not have a log because I rolled back to 18.06.4 quickly to preserve my sanity from incessant user's requests, but I could go back and try some things that are proven to work for your specific case.

Any suggestions?

Solution details:

2 Likes

I decided myself to switch back to 18.06.8. I saw some coincidence / causality in the logs but it only shows the surface where to start dig / debug.

If 18.06.8 is working well, then I think that I will, at least, revert to that version that has security fixes along the way. Thank you for your feedback.

Maybe the real solution would be to use 19.07.2 with the same firmwares from kvalo that are in use in 18.06.8... When I switched, I've used the latest ones, so maybe that's the problem.

Would indeed be interesting. Perhaps your curiosity wins and you try again :- )
Strange that there are not more complaints.

PS: KRACK mitigation in both scenarios enabled?

Yes on for both, but you make me dubious right now, maybe I didn't configure it on all wireless...

The only difference I see is probably that with 18.06.4, KRACK is on, but with 19.07.2, Wireless is on KRACK... :smirk:

I will retest all that on less critical times - pager duty this week.

That sounds really strange: I am suing firmware-5.bin_10.4-3.9.0.2-00086 with a matching board file and everything works fine.

2 Likes

Just to be sure, we are talking about the board-2.bin file, right?

Matching means each version has it's board-2.bin file? I only see the board-2.bin file in https://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0. Is there something I'm not doing correctly?

Yeah, That one.

Same issue here, replace family with 'Roommates'.

I've been running19.07.0 r10860 though, and sad to hear that the 2 point releases didn't improve anything.

I may just downgrade to 18.06.4 as well. Just posting here and subscribing to see if I can help find a fix.

Essentially, this is what has to be done on 19.07.2 to follow up on the answer from @fantom-x:

opkg update
opkg remove ath10k-firmware-qca9984-ct kmod-ath10k-ct
opkg install wget ath10k-firmware-qca9984 kmod-ath10k
cd /lib/firmware/ath10k/QCA9984/hw1.0/
mv board-2.bin board-2.bin.bk && mv firmware-5.bin firmware-5.bin.bk
wget -O firmware-5.bin https://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/3.9.0.2/firmware-5.bin_10.4-3.9.0.2-00086 --no-check-certificate
wget https://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/board-2.bin --no-check-certificate

If you wish to apply this fix, this is what you would have to run in the CLI, then reboot the router. Just paste it line by line.

EDIT: Updated to include substitution of kmod-ath10k-ct also that I mentioned on this post.

6 Likes

Thanks for the update. I'll give it a try on mine later today and see if it helps with wireless stability.

Still have to throw 19.07.2 on it first.

As per @Doppel-D, 18.06.8 seems to work well too, so that would be a better option than 18.06.4 since there are quite good CVE fixes between these releases, among other things.

Yeah I haven't looked at the 18.X releases yet, as I got my new router right after 19.X was finalized. Upgraded from a DDWRT Shibby fork IIRC. I'll pick up whatever is latest if I decide to go that route.

Note: This is the partial solution, view complete solution in this post.

I've reinstalled 19.07.2 with the Non-CT firmware. We'll see how it goes tomorrow. I've also done this:

opkg update
opkg remove kmod-ath10k-ct
opkg install kmod-ath10k

That looked appropriate since I should be running Non-CT everywhere. This command helped me find it:

root@Mercure03:~# opkg list-installed | grep ath10k
ath10k-firmware-qca9984 - 20190416-1
kmod-ath10k - 4.14.171+4.19.98-1-1

2 Likes

So far, so good with the changes to Non-CT!

It was a heavy day usage from 2 gaming consoles (wired/wireless), 1 wireless gaming PC, 1 streaming tablet (wireless) and 2 remote work PC stations (wired), along with VPNs, softphone calls, conference calls and WebEx meetings.

No disconnection, rock solid connections everywhere. System log is clean as a whistle from errors or warnings, except for:

daemon.err procd: unable to find /sbin/ujail: No such file or directory (-1)

I am getting them too and not sure if it affects anything.

New warning tonight (on 19.07.2):

Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.603381] ------------[ cut here ]------------
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.603461] WARNING: CPU: 0 PID: 0 at backports-4.19.98-1/drivers/net/wireless/ath/ath10k/htt_rx.c:1179 0xbf3a0d10 [ath10k_core@bf38a000+0x48000]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.607124] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat sch_cake nf_conntrack sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.668997]  cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_of_simple ohci_platform ohci_hcd phy_qcom_dwc3 ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.704829] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.171 #0
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.727040] Hardware name: Generic DT based system
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.733121] Function entered at [<c030f1c4>] from [<c030b390>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.737891] Function entered at [<c030b390>] from [<c07c0664>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.743794] Function entered at [<c07c0664>] from [<c031fa98>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.749697] Function entered at [<c031fa98>] from [<c031fb84>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.755597] Function entered at [<c031fb84>] from [<bf3a0d10>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.761561] Function entered at [<bf3a0d10>] from [<bf3a2224>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.767405] Function entered at [<bf3a2224>] from [<bf3a29bc>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.773307] Function entered at [<bf3a29bc>] from [<bf3d608c>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.779236] Function entered at [<bf3d608c>] from [<c06a9660>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.785112] Function entered at [<c06a9660>] from [<c03015c8>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.791015] Function entered at [<c03015c8>] from [<c0324000>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.796916] Function entered at [<c0324000>] from [<c0362b60>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.802822] Function entered at [<c0362b60>] from [<c0301488>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.808722] Function entered at [<c0301488>] from [<c030bf8c>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.814626] Exception stack(0xc0a01f48 to 0xc0a01f90)
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.820552] 1f40:                   00000001 00000000 00000000 c0315100 ffffe000 c0a03cb8
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.825771] 1f60: c0a03c6c 00000000 00000000 c092ea28 00000000 00000000 c0a01f90 c0a01f98
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.834001] 1f80: c030854c c0308550 60000013 ffffffff
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.842231] Function entered at [<c030bf8c>] from [<c0308550>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.847351] Function entered at [<c0308550>] from [<c03589c8>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.853168] Function entered at [<c03589c8>] from [<c0358d10>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.859069] Function entered at [<c0358d10>] from [<c0900c54>]
Sat Mar 21 20:59:30 2020 kern.warn kernel: [156361.865012] ---[ end trace 3bc62172a69aea2a ]---

Just to confirm, You are still running 19.07.2 Non CT RIGHT?

Yes. I edited my previous post to make that clear.

That log entry has wrong type obviously (19.07.2) - should be Info not Error:

Sun Mar 22 22:15:18 2020 daemon.err uhttpd[2669]: luci: accepted login on / for root from 192.168.1.200

I'm gonna let it run for more days before marking fantom-x as solution, because it seems to be pretty much solid right now.