To clarify, it's only when I untag the port, and have it "off" from the original VLAN 1. And it ONLY affects WLAN<->VLAN10. All my hardwired clients are fine VLAN1<->VLAN10. I have since switched back to default VLAN setup. Below config is where I had issues. I was originally using separate VLANs to try and mitigate the 'duplicate packet' whenever my wifi clients would roam between Orbi AP and the r7800's AP. @D43m0n
I use my vlan's to separate WiFi networks; I have a few devices running OpenWRT configured as dumb AP. All transmit the same SSID's that are using vlan's to separate the ip traffic. I find this stuff difficult to understand and interpret to be honest. The idea is that a few dumb AP's and my router all transmit my private SSID and a guest SSID. The dumb AP's are connected via a LAN cable to my router. Clients are able to roam throughout the house, whether they are connected to the guest or private SSID. Does this compare to your use case? I'm trying to understand/establish whether our use cases are similar and if the CPU spikes you encountered might eventually result in a rcu_sched self-detected stall on CPU that some of us are seeing.
All these chipsets are grouped under "ath10k firmware" but their firmware are different, so the information gleaned from the Internet must be taken with a grain of salt if detailed information is not provided.
Thanks I was able to confirm as well.
Using the ath10k-ct R7800-20220816-MasterNSS-sysupgrade build I got 950MB down and 600MB upload speeds and the average load never went above 0.12.
This is with PPPOE and VLAN tagging.
What are your settings for 500Mbps on Wifi? I'm only getting ~200.
My R7800 and EA8500 (both with irqbalance disabled) passed the full hot day (about 24 hours uptime) without any random reboot. Yesterday, my R7800 crashed 3 times and EA8500 once. Even though it's too early to say about the real "culpability" of irqbalance in the instability of recent builds, I will never use irqbalance and packet steering for NSS builds. They're pretty pointless features, especially for NSS builds. My reasoning against them is in my previous posts.
My R7800 and EA8500 run pretty complicated configurations. Each of them runs as both VPN client and server, using 4 different VPN technologies to handle different network topologies behind NAT, socks5, proxy etc.
I use irqbalance and I have reached 37 days of activity. in short: irqbalance, stubby, adblock, packet forwarding enabled, promiscuous brlan disabled, sqm disabled, vlan for pppoe working, 20 2.4ghz devices with manual airtime configuration, several 5ghz devices, openvpn 24/7 connection, wireguard with occasional use, NAS server with internal lan transfer normally.
With that configuration it has lasted 37 days and because I activated it to try promiscuous brlan and it restarted in hours, if not more guaranteed activity time.
With that "37 days" uptime, it means that the image you're running does not work with NSS-offloading for PPPoE at all. 22.03/Master images with working NSS-offloading for PPPoE were only available since about 3 weeks ago.
Builds with working NSS-offloading definitely are more "fragile" than vanilla OpenWrt images. The random reboot/instability issues we're talking about lately are for very recent images only.
if pppoe works perfectly, the error has been that I am talking about 22.01 and not 22.03. now after the reboot with promiscuous brlan i have installed 22.03 of acwifidude and it starts the counter again, but i have kept everything i said before 22.01. sorry for the slip
Regarding the recent 22.03 or master builds, they may be stable for some people but not for others. This is understandable since our configurations, installed packages, WIFI client numbers and types, traffic patterns/loads, ambient temperatures etc. are all different.
For those of us experiencing instabilities with recent builds, we're just trying different tricks (e.g. disabling irqbalance etc.) to see if they may alleviate or even get rid of these random reboots/crashes.
If you redirect your syslog to a remote syslog server, or compile the image with CONFIG_PSTORE_CONSOLE=y, I'm 99.9999% sure that you will see these RCU stalling messages before the reboot takes place. Random reboots caused by these RCU stalling events do not produce any core dump because the kernel does not treat them as crashes.
How many days was the uptime.
I've turned off irqbalance and packet steering on a R7800 that uses pppoe. It is set to performance governor but still experiences spontaneous reboots every few hours. Sometimes survives 2-3 days.
With the build that didn't have the latest NSS pppoe fix it worked over 10 days with irqbalamce and packet steering enabled using ondemand governor.
Well, I've got the 21.02.x build running for about 2 weeks that @vochong built for me. That too has spontaneous reboots... Not as often as the kernel 5.10 based builds (either 22.03 or master) with NSS PPPoE acceleration, but still... The 21.02 build I had running in the beginning of this year was a lot more stable.
I've loaded the previous ath10k firmware (10.4-3.9.0.2-00156) on this 21.02.x build to check if the newer version might have a problem. This previous firmware version doesn't really make a difference. I'll disable irqbalance now (packet steering was disabled a few weeks ago).
One one hand I think it would be interesting if I keep my router on this 21.02.x build and try one change at a time to see if reboots don't occur every few days. On the other hand I'd like to start with a fresh flash without keeping the current configuration intact and configure from the start. But I could also try a ath10k-ct based build...
In the meantime I've made a simple docker image that allows me to build images faster on my MacBook (which is getting old, but still has a lot more oomph than my NAS).
Is this all we need to add to the diffconfig to get some output in /sys/fs/pstore? I see hnyman does some changes in this post. He seems to add these too:
+CONFIG_PSTORE=y
+# CONFIG_PSTORE_842_COMPRESS is not set
+CONFIG_PSTORE_COMPRESS=y
+CONFIG_PSTORE_COMPRESS_DEFAULT="deflate"
+# CONFIG_PSTORE_CONSOLE is not set
+CONFIG_PSTORE_DEFLATE_COMPRESS=y
+CONFIG_PSTORE_DEFLATE_COMPRESS_DEFAULT=y
+# CONFIG_PSTORE_LZ4HC_COMPRESS is not set
+# CONFIG_PSTORE_LZ4_COMPRESS is not set
+# CONFIG_PSTORE_LZO_COMPRESS is not set
+# CONFIG_PSTORE_PMSG is not set
+CONFIG_PSTORE_RAM=y
+# CONFIG_PSTORE_ZSTD_COMPRESS is not set
+CONFIG_REED_SOLOMON=y
+CONFIG_REED_SOLOMON_DEC8=y
+CONFIG_REED_SOLOMON_ENC8=y
From what I see in recent 22.03 or master repo's, the required changes in qcom-ipq8065-nighthawk.dtsi are already present. I'm doubting whether all my previous kernel 5.10 based builds had properly set ramoops to be saved in /sys/fs/pstore.
In another post, hnyman also explains how he patches this for 21.02 builds. Now I must say I'm tempted to build a 21.02 image with ramoops enabled since I now frequently experience random reboots to, but ramoops isn't enabled in 21.02 by default like it is in 22.03/master for R7800. I'd need to clone @ACwifidude 21.02 NSS repo, rebase and apply Felix' latest AQL/ATF patches too. The first time I tried that, compiling failed. Do you have any tips on how to do that @vochong ?
Don't waste your time on building 21.02 with RAMOOPS support. The 21.02 branch does not have any built-in support for kmod-ramoops and kmod-pstore, so any dtsi change will serve no purpose if these kernel modules cannot be not compiled and loaded.