Netgear R7800 exploration (IPQ8065, QCA9984)

in the DSA logic you need some way to tell the switch where he should send the packet. Almost all the switch support this feature with all sort of way. Most of them supports special proprietary header where you can set the destination of skb. This is necessary for dsa as the switch doesn't support all the feature in hardware and using this logic of setting the skb destination, the missing feature can be supported via software by doing the hard work of understanding the skb destination and filling the header.
The switch will check this header and forward the packet to the correct port.

It's the same for the receive path except with header mode enabled, the switch tells you where the packet comes from.

Thing is you can't hack and put the information into other part as you would still lose the ability to know from where the packet comes from.

Actually, the source and destination MAC address of the frame will be enough for the switch to determine which port to send out the frame to, for the Linux tx path (which is the switch CPU rx path.) This as I understand it, is how the swconfig switch driver works.

For the switch to Linux path (i.e. switch CPU tx or Linux rx) I see the tag driver code extracting the port number that the frame is from, and it then lookup the netdevice from a dsa port data structure. The netdevice is set back to the skb which is then returned by the function.

Now the Atheros header does this perfectly. When I looked at the mt7530 tag driver, the port details are only returned from a 2 bit data structure, which is not enough to even describe the 7 ports of the switch. Since the netdevice is always fixed for the switch, this looks a little redundant. From the spec of the mt7530/1 that I have, it doesn't look like it supports special headers.

Unless the DSA framework needs to keep track of in/out ports (which I don't think so), I don't really see the need for the special frame header to be inserted into the frame to/from the switch.

This

If a switch doesn't provide a way to do this via special header, a software solution is needed by putting the header right after the mac and skb type.

Hmm ... how would the Linux DSA stack know which port of the switch a frame was from, if the switch could not tell the Linux DSA stack in the received frame? I don't think any software solution can solve this problem.

Linux can insert special headers into the frame before sending to the switch, but if the switch do not understand it, it will be discarded, as the switch will think the frame is corrupted.

I think there's something that I do not understand. Need to study further. But the mt7530.c DSA drivers do not look right to me at the moment.

@quarky
Master - ath10k - OpenWrt SNAPSHOT, r18942-cbfce92367 - 17:57:15 up 7 days, 21:59
Can't say I encountered any wifi issues yet, everything works fine.

EDIT, 3 days later...
Still running fine, no reboots either but I've set minimum frequency to 800Mhz.

1 Like

Ramoops has now been turned on by default in master for R7800, no need to include kmod-ramoops separately, any more.

2 Likes

Back with another interim report of my R7800 with updated 21.02 build reverted to the old round-robin airtime scheduler, instead of the new virtual time-based airtime scheduler.

Happy to report that my R7800 is still running fine with the 5GHz radio only after more than 17 days. No ping spikes or high latency access for the past 17 days.

Now the decision is if I should revert back to the new virtual time-base airtime scheduler, but disable the AQL limit processing, as suggested by @KONG or continue with the old round-robin scheduler. As they say, if it's not broken, don't fix it.

Decisions, decisions, decisions ....

1 Like

Hmm, I thought that went "if it's not broken, break it" Perhaps that's why I have so much trouble.

You know, it you switch back to ct you can turn on/off the aql, switch between virtual atf, round robin, or even try the ATF algo proposed by @castiel652 all at the flick of a fwcfg variable and rmmod ath10k_pci && modprobe ath10k_pci.

Just saying if your going to complicate your life, do it right.

1 Like

Haha ... I'm more inclined for stability rather than cutting edge. That's why my R7800 is on 21.02.

1 Like

Something weird going on since I upgraded to the official 21.02.2 build: it appears there is at least one device (Pixel phone) that when it leaves the house causes the wifi network to have very high latency and packet loss for a few minutes, unusable by the rest of the clients. Then it recovers on its own or I can force it to if I just log in to the R7800 (via wired which still works fine) and just type wifi which restarts wifi.

The log looks like this (MAC redacted):

Wed Mar  9 09:45:50 2022 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED xx:xx:xx:xx:xx:xx
Wed Mar  9 09:45:50 2022 daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disassociated due to inactivity
Wed Mar  9 09:45:51 2022 daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.197590] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.197631] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.203782] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.211061] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 0
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.218685] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.225663] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.232951] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7
Wed Mar  9 09:45:51 2022 kern.warn kernel: [150007.240228] ath10k_pci 0000:01:00.0: failed to lookup txq for peer_id 16 tid 7

Note that I always run mainline driver + firmware (not CT). This setup was solid on 21.02.1 and before, problems started with the upgrade to 21.02.2. I'll go back to 21.02.1 to make sure that version is still solid.

Does this ring any bells? Potential hostapd issues, which gets patched with backports?

WAG: android devices like ipv6. Is software offloading activated? If it is, you missed the fine print.

There is no IPv6 on my LAN and software offloading is not enabled, the R7800 just runs in AP mode.

1 Like

This solution may help. No harm trying it out IMHO.

1 Like

Sorry to interrupt the majestic workflow of this topic. Really, I'm a noob and hope not to disturb too much.
I'd like to know only if a stable build with activated NSS is present somewhere, and if you suggest it to an average user like me (just simple opkg update, install and setup).
The only NSS ones I stumbled on are masters and snapshots, and I'm really not quite familiar with owrt to make myself confortable.
Thank you so much for your immense knowledge and work.

Feel free to try it out. You can flash back and forth from stable.

1 Like

It's simply using mac80211 to calculate aritime.
Not an algo proposed by me.

1 Like

I suppose i could use "implementation" in place of "algo," but perhaps it's best just to reference your patch.

EDIT: castiel652's patch is only for ath10k-ct and they originally suggested it for the r7500v2 which apparently does not support virtual time based ATF without such a modification. I'm not sure if/how well it will run on other ipq806x systems like the r7800.

I'm currently running an adapted version of this patch on the r7500v2 which allows me to use the ath10k-ct fwcfg API to "turn on/off" this alternate airtime calculation and, I hope, "turns off" any other airtime calculation when using castiel652's patch.

Should I work on nss for 5.15 or continue my war of pushing patches upstream?

  • nss for 5.15
  • send patch upstream

0 voters

The current state of pushing the patch upstream is:

  • working on tcsr (low hope i will manage to find a way)
  • pushed gcc fixes for rpm (merged)
  • proposed dtsi changes
  • proposed spm patch
  • working on improving qca8k
  • cpufreq driver is a mistery... no idea if it will ever be merged
  • have to refresh all the krait scale driver
  • work on a correct nss scale driver

For nss

  • drop all the qsdk shittery and investigate how to enable nss offload directly in the gmac driver
  • make all the offload code work on 5.15 (for 5.10 there is already some code and minimum support but i'm full or exam to do and i hope @ACwifidude can work on adding support for 5.10 with openwrt 22.0 that will be based on 5.10)
3 Likes

Hard choice, if you figure out why some fw rules require br-lan to be in promisc mode and fix the bug that makes the router crash after a day or two because of this, I'd say go for NSS. But do whatever you want really.

I still have to investigate it but I don't know if it's a placebo, currently my router now lasts 7 days...
Also I'm not sure about something... (that i can't actually test)

In the original firmware they set some clk controlled by rpm to the normal value... this can be idle normal or turbo... I have no idea if the clk is set to turbo by default... Wonder if i should test also that... Aside from that the crash dump can be caused only by a hardware defect or some problem in the mux