Build for Netgear R7800

I've been running 18.06 r7144 of this build for months now (it seemed stable, and as history of this build is becoming a hit-and-miss similar to DD-WRT, I stick to ones that work), set to auto-reboot every week on Monday morning. Today (Sunday noon) strange thing happened. Router was pingable but lost WAN connection and web interface. Power cycling fixed it. I disabled software offload now, just in case there was some rare glitch in that functionality.

Please use this topic only for questions related to this specific community build.
For any questions regarding official releases / snapshots, please open new topics.

Anyone having random reboots with r9361-5b6997dcb3-20190217?
Any logs to look at after a random reboot that may reveal the cause?

FWIW, I tried the "ct" build and it's not going well. I noticed that my Nest thermostat is not happy trying to stay connected to the router on the 2.4 band.
There's also a crash in dmesg:

[ 1688.972966] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.101 #0
[ 1688.995110] Hardware name: Generic DT based system
[ 1689.001138] [] (unwind_backtrace) from [] (show_stack+0x14/0x20)
[ 1689.005818] [] (show_stack) from [] (dump_stack+0x88/0x9c)
[ 1689.013713] [] (dump_stack) from [] (__warn+0xf0/0x11c)
[ 1689.020738] [] (__warn) from [] (warn_slowpath_null+0x20/0x28)
[ 1689.027649] [] (warn_slowpath_null) from [] (ath10k_htt_t2h_msg_handler+0x1674/0x265c [ath10k_core])
[ 1689.035436] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_t2h_msg_handler+0x2468/0x265c [ath10k_core])
[ 1689.046371] [] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [] (ath10k_htt_txrx_compl_task+0x6f8/0xbb8 [ath10k_core])
[ 1689.058915] [] (ath10k_htt_txrx_compl_task [ath10k_core]) from [] (ath10k_pci_napi_poll+0x7c/0x11c [ath10k_pci])
[ 1689.071661] [] (ath10k_pci_napi_poll [ath10k_pci]) from [] (net_rx_action+0x144/0x31c)
[ 1689.083499] [] (net_rx_action) from [] (__do_softirq+0xf0/0x264)
[ 1689.092964] [] (__do_softirq) from [] (irq_exit+0xdc/0x148)
[ 1689.100859] [] (irq_exit) from [] (__handle_domain_irq+0xa8/0xc8)
[ 1689.107885] [] (__handle_domain_irq) from [] (gic_handle_irq+0x6c/0xb8)
[ 1689.115869] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90)
[ 1689.124020] Exception stack(0xc0b01f48 to 0xc0b01f90)
[ 1689.131680] 1f40: 00000001 00000000 00000000 c0315340 ffffe000 c0b03cbc
[ 1689.136727] 1f60: c0b03c70 00000000 00000000 c0a2ca28 00000000 00000000 c0b01f90 c0b01f98
[ 1689.144868] 1f80: c030884c c0308850 60000013 ffffffff
[ 1689.153026] [] (__irq_svc) from [] (arch_cpu_idle+0x38/0x44)
[ 1689.158066] [] (arch_cpu_idle) from [] (do_idle+0xe8/0x1bc)
[ 1689.165523] [] (do_idle) from [] (cpu_startup_entry+0x1c/0x20)
[ 1689.172568] [] (cpu_startup_entry) from [] (start_kernel+0x3fc/0x408)
[ 1689.180265] ---[ end trace f44d6429ae5590b2 ]---
[ 1689.192021] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[ 1689.193231] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[ 1689.200202] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon

No issues with CT here yet but seems like the bugs that are left might be hard to catch.

@wired If you can reproduce it I suggest filing an issue here

It looks like similar CT issues have already been reported, hopefully these bugs can be squashed.

Unrelated question: I disable a bunch of services because I run in plain AP mode. I was surprised to see that after a firmware upgrade the disabled state of the services has not been saved, all services are enabled once again. I learned my lesson and instead of going through the list disabling them in the GUI I now have a script.
What's a safe location for this script on the filesystem so that it survives reboots and updates? I'm thinking of calling it from rc.local to stop and then disable these services if they happen to be running after a reboot/upgrade, just not sure where to place it on the filesystem to survive. Not sure if /root is a good place for user scripts.

That is the standard behaviour.
I have stumbled upon that already in 2011...
https://forum.archive.openwrt.org/viewtopic.php?id=27285
https://forum.archive.openwrt.org/viewtopic.php?id=37488

If you are building your own firmware, one nice way is to make that disabling in /etc/uci-defaults/xxx... , as those uci/defaults are run only once after flash at the inital boot and are then deleted (and are visible in a live system in /rom/etc/uci-defaults dir).

If using my firmware or other binary firmware, the /etc/rc.local is probably the easiest place. But that is run at each boot, which might be unnecessary.

Ps. "Good behaving" services have a separate uci config item to enable/disable them in addition to the /etc/rc.d syslink thing (that is the system-level enable/disable way).

Makes sense. I'm relatively new to OpenWRT so trying to learn best practices.

What is a safe directory to place user scripts so they survive reboots/upgrades? rc.local can get out of control quickly if placing the actual code in there, nicer for it to call out to various little scripts that can be commented out as needed.

For manual running, a nice trick to place in "/etc/config/xxx/" directory. Those are automatically included in sysupgrade. (placing the script in /etc/config itself may cause uci errors, as it is thought to be a config file.)

Naturally you can also add any file into your personal sysupgrade backup list in /etc/sysupgrade.conf , which is also explained in wiki....

Empty placeholder file:

root@router1:/etc# cat /etc/sysupgrade.conf

## This file contains files and directories that should
## be preserved during an upgrade.

# /etc/example.conf
# /etc/openvpn/

Visible in LuCI in the Configuration tab of the flashing page:
https://192.168.1.1/cgi-bin/luci/admin/system/flashops/backupfiles?display=edit
https://192.168.1.1/cgi-bin/luci/admin/system/flashops/backupfiles?display=list

2 Likes

Thank you, great info! I thought about putting stuff in /etc/config but didn't feel right.
I will add a custom directory to /etc/sysupgrade.conf instead.

Unfortunately I get crashes with r9330-880f8e6d32 "old" build too :frowning:
No random reboots on this build though, so at least it stays up and recovers without downtime.

[  419.213421] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.98 #0
[  419.235567] Hardware name: Generic DT based system
[  419.241596] [<c030f58c>] (unwind_backtrace) from [<c030b7a4>] (show_stack+0x14/0x20)
[  419.246188] [<c030b7a4>] (show_stack) from [<c079cd18>] (dump_stack+0x88/0x9c)
[  419.254083] [<c079cd18>] (dump_stack) from [<c0322eec>] (__warn+0xf0/0x11c)
[  419.261109] [<c0322eec>] (__warn) from [<c0322fd8>] (warn_slowpath_null+0x20/0x28)
[  419.268015] [<c0322fd8>] (warn_slowpath_null) from [<bf8012d0>] (ath10k_htt_t2h_msg_handler+0x11ec/0x216c [ath10k_core])
[  419.275794] [<bf8012d0>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf802074>] (ath10k_htt_t2h_msg_handler+0x1f90/0x216c [ath10k_core])
[  419.286732] [<bf802074>] (ath10k_htt_t2h_msg_handler [ath10k_core]) from [<bf80291c>] (ath10k_htt_txrx_compl_task+0x6cc/0xb10 [ath10k_core])
[  419.299277] [<bf80291c>] (ath10k_htt_txrx_compl_task [ath10k_core]) from [<bf848b1c>] (ath10k_pci_napi_poll+0x78/0x118 [ath10k_pci])
[  419.312033] [<bf848b1c>] (ath10k_pci_napi_poll [ath10k_pci]) from [<c069b5c8>] (net_rx_action+0x144/0x31c)
[  419.323868] [<c069b5c8>] (net_rx_action) from [<c03015c8>] (__do_softirq+0xf0/0x264)
[  419.333335] [<c03015c8>] (__do_softirq) from [<c03272f4>] (irq_exit+0xdc/0x148)
[  419.341230] [<c03272f4>] (irq_exit) from [<c0363fc0>] (__handle_domain_irq+0xa8/0xc8)
[  419.348256] [<c0363fc0>] (__handle_domain_irq) from [<c0301488>] (gic_handle_irq+0x6c/0xb8)
[  419.356240] [<c0301488>] (gic_handle_irq) from [<c030c38c>] (__irq_svc+0x6c/0x90)
[  419.364391] Exception stack(0xc0b01f48 to 0xc0b01f90)
[  419.372049] 1f40:                   00000001 00000000 00000000 c0315300 ffffe000 c0b03cbc
[  419.377095] 1f60: c0b03c70 00000000 00000000 c0a2ca28 00000000 00000000 c0b01f90 c0b01f98
[  419.385240] 1f80: c030884c c0308850 60000013 ffffffff
[  419.393396] [<c030c38c>] (__irq_svc) from [<c0308850>] (arch_cpu_idle+0x38/0x44)
[  419.398437] [<c0308850>] (arch_cpu_idle) from [<c0359df0>] (do_idle+0xe8/0x1bc)
[  419.405894] [<c0359df0>] (do_idle) from [<c035a138>] (cpu_startup_entry+0x1c/0x20)
[  419.412939] [<c035a138>] (cpu_startup_entry) from [<c0a00ce4>] (start_kernel+0x3fc/0x408)
[  419.420654] ---[ end trace a73674eceda233e5 ]---
[  419.430918] ath10k_pci 0001:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[  419.434917] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon
[  419.440551] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0, skipped old beacon

Ooops, spoke too soon. Just had a reboot. Anyone else seeing random reboots? I hope it's just software and not some hardware issue. Any known recent issues?

Unfortunately all logs are wiped so can't tell what may be going on. I'm going to leave a "logread -f" running in a terminal maybe I catch something right before it crashes next time.

@hnyman Is your openwrt-18.06 build just a stock build using the latest commits on the branch or does it include your customizations? Can I flash it and keep the configs from your latest master builds or do I need to reset it all and reconfigure? Trying to get to the bottom of these crashes, wondering if 18.06.2 builds use more stable ath drivers.

You can see full source diffs in the download directory...
There are some minor customisations, currently mainly reagrding package dfault settings, but nothing to prevent using the same config as in master. Currently there are no major differences between 18.06 and master.
I use the same configsfor master and 18.06 in my own router, when testing new builds.

1 Like

I'm giving up on Openwrt with the R7800 for now. 17 series worked without a glitch here. 18 series has been a big letdown in a not-complaining-because-it's-free kinda way. (Even tried hnyman builds) hoping the second point release would remedy the crashes but I guess it's back to OFW or DD-WRT until the bugs are resolved.

1 Like

Although I don't have crashes, I am not satisfied yet. Internet is not fast / slow. In my perception it was better in the past (Lede area).

No crashes here, I only have 100mbit internet so I can't say something about internet speed.

The CT driver seems to work ok for me now (AP mode).

However I run Out of memory with dnsmasq and adblock in the recent master builds, this is new... never had issues in the past.

I'm unsure if booting without adblock enabled uses more memory now...
Free: 365Mb of 477Mb

Guys, I need to add some positive feedback.
From my point of view, the issues you are facing are not specifically related to this build.

I am running two r7800 with latest 18.06 HEAD behind cable and DSL modems (both dual stack) with all the packages from hnymans non-CT config + VPN with

  • full speed (150/20 Mbit)
  • no crashes
  • no reboots
  • no clients being disconnected on 2,4 and 5 band
  • 24*7 VPN uptime

Totally happy - stable systems, just fine.

Honestly same for me. As far as two of the best routers go from my limited experience: WRT32X is currently best with OpenWrt, and R7800 is currently best with DD-WRT.

And I am having no issues either with my personal build using latest non-ct wireless drivers.

It definitely depends on the env and clients. For example my Nest thermostat does not like the CT driver, but it's stable with the non-CT driver and other 2 APs using Atheros or Broadcom chips.
As for random reboots, they come and go, not sure what triggers them. For example right now my R7800 has been up for almost 6 days using non-ct Feb 13 build. Other times I get a couple reboots in the span of minutes. I run mine in AP mode and there's no heavy wifi traffic, but there are 10-15 clients connected to it at all times.