Netgear R7800 exploration (IPQ8065, QCA9984)

Can I add to this thread a bug for R7800.
I've posted my final findings here.

The issue was reported on Github

I hope the developers will be able to fix this bug soon.

if you can build your own image... did you test this with the dsa driver?

Yesterday I've tried @hnyman test-DSA build but I had no Laptop with cable connection to fully test using iperf3. I've tested with Android TV box that has only 100Mbps NIC and I think the results were bad. I can build my own image but I may need some help regarding the DSA part. I don't know if the reason for the low WAN speed is swconfig or DSA driver. Maybe more users can test and confirm. It's really simple and would take a few minutes.

Incidentally, there’s also a similar issue with mt7622 DSA drivers recently that’s causing link speed to throttle down to the lowest client link speed. For my E8450, upload speed will be limited to 10mbps when the TV connected to it is in standby mode where it’s LAN port will switch to 10mbps link speed. Disconnecting the TV gives me back full gigabit speed.

I sort of traced it down to the code changes where the switch port auto learning feature was disabled.

Wonder if both issues are similar.

1 Like

@quarky @Ansuel
Quarky, is this the same issue you talk about?

Yup, this is the issue I was referring to.

1 Like

Did you happen to try 21.02 builds without NSS acceleration? I remember you encounter performance issues with the 21.02 NSS builds?

I've now replaced by E8450 (planning to 'fix' the mt7530.c driver) with my R7800 running my custom 21.02 builds with NSS acceleration enabled. I do not see any performance hit even with a 10mbps link connected to the R7800.

I only have WAN performance degradation when device/s is/are connected at 100Mbps and there is LAN traffic between LAN connected devices at the same time (probably because there are two tricky conditions that need to be present at the same time many users do not notice this). But at least one of the devices involved in the LAN traffic should be connected at 100Mbps. I simulated this with a 1Gbps laptop connected with 100Mbps cable and with a device that has 100Mbps NIC.
If the LAN traffic is between two devices and both are at 1Gbps the WAN performance degradation doesn't occur. I see full WAN performance in this case. My link is 1Gbps/700Mbps.
To reproduce this I run iperf3 with a predefined throughput of only 60Mbps between a Laptop connected at 100Mbps! (this is the culprit) and a PC connected at 1Gbps. The result - the PC can only download/upload from/to WAN at really low speeds. Just 15-20 to rarely 100-200 Mbps with ping above 100ms. This depends on the current LAN traffic. Larger LAN transfers (as several 4K LAN streams that are total 100-200Mbps) cause almost compete inability for the devices to receive/send to WAN.
But if several devices are connected to an additional switch (gigabit, as in my case) and there is a 100Mbps device (even one among them) involved in even moderate LAN traffic at the same time, then all devices see huge WAN performance drop to even 5-6Mbps.
If there is third device (probably fourth one too) connected at 1Gbps on another R7800 port it can download/upload from/to WAN at full speed as long as it currently doesn't take part in LAN transfers with other 100Mbps only. And this is no matter if another pair of devices 1G/100Mbps are affected at the same time by the bug.

I have to stress that two compulsory conditions should be met at the same time in order to reproduce the WAN performance drop. A client connected at 100Mbps (let's call it "Problem Client") and a LAN traffic between the "Problem Client" and any other device connected to LAN by cable.

Weird bugs that surfaced recently...
I'm certain now that the CPU frequency scaling bug can be worked around by setting a higher min clock, I have 23 days uptime with a min clock of 800Mhz.

EDIT: Now uptime is 34d 16h 37m 0s

2 Likes

Well, WiFi interfaces just went into disabled state now, I had to restart them, the router didn't reboot tho'.

[2069691.801451] device wlan0 left promiscuous mode
[2069691.801588] br-lan: port 3(wlan0) entered disabled state
[2069691.890988] ath10k_pci 0000:01:00.0: could not get mac80211 beacon
[2069692.095787] ath10k_pci 0000:01:00.0: could not get mac80211 beacon
[2069692.300577] ath10k_pci 0000:01:00.0: could not get mac80211 beacon
[2069692.505379] ath10k_pci 0000:01:00.0: could not get mac80211 beacon
[2069692.519704] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[2069692.519778] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[2069692.525600] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[2069692.616753] device wlan1 left promiscuous mode
[2069692.616916] br-lan: port 2(wlan1) entered disabled state
[2069692.770377] ath10k_pci 0001:01:00.0: could not get mac80211 beacon
[2069692.808612] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 1
[2069692.808724] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 1

EDIT:
DFS shouldn't cause wifi to be disabled right? It's weird that both interfaces went down.

It will, if I'm not wrong, if you set the interface to a DFS channel and radar was detected for that channel. Was the channel set to a fixed channel or was it auto?

Fixed... But still weird that 2.4Ghz went down.
I'll change 5Ghz back to non-DFS.

1 Like

How do I get 160 mhz working for my 5.0 ghz wifi? I'm running latest stable hnyman's build

For me VHT160 works quite normally, with FI country code, both channel 36 and 100 work for me. DFS detection takes a minute at startup, but completes nicely.

1 Like

New openwrt user here (but experienced software dev) on an XR500 (same hardware as the R7800) and I had this same __krait_mux_set_sel crash twice in the last 24 hours. I'm running the latest snapshot from 2022-03-24.

When it crashed last night, it happened shortly after a decent amount of load followed by a very low amount of load. So if the CPU core frequency ramped downward, perhaps that's what triggered the crash.

I had not set up any min frequency parameters in the local startup script, but it seems like that might be the best workaround at this point from what others have said?

Let me know if there's any way I can help track down the issue.

1 Like

That should be safe and easy to try. Keep an eye on what @ansuel is saying about cpu idle and if your a little more brave, you could try:

1 Like

Didn't know about wifi county code. Thank you

US on channel 36 works great for me!

1 Like

I had the same problem when connecting R6020 (100Mbps) and R7800 (1Gbps) using ethernet. I ended up using the R7800 as dumb AP and an awesome E8450 as the head router. It is time to jump into wifi 6. It works.

1 Like

I add more data about issues with 100Mbps devices.

@Ansuel @quarky
I've tested extensively the latest stock Netgear firmware and latest Voxel firmware for R7800.
I've found that both have almost the same issue as OpenWRT.
There are a few differences. The stock firmware is more resilient and when the LAN traffic is present the full WAN download speed is still achievable even though with higher ping times. But when the WAN speed is at full there are drops in LAN transfer speed.
The upload WAN speed is limited as with OpenWRT firmware.
Original Netgear firmware experiences issues with WAN download speeds too but this happens when there are two different 100Mbps devices connected to the LAN ports and there is a LAN transfer involving one desktop PC (1Gbps) sending files over LAN (or running iperf3 server) to both 100Mbps clients. Using this scenario the R7800 has problems to achieve full WAN down/up speeds.
If the clients are connected at 1Gbps full WAN/LAN speeds at low pings are possible.

1 Like

Interesting. My R7800 is now serving as my main home router. It has a smart TV connected to the R7800 LAN port and it maxed out at 100mbps and switched to 10mbps when on stand-by mode. I do not see the limit to upload/download speed tho., but maybe it's because the TV is not actively using the network. The ar8337 switch should not have this limitation as far as I can tell tho.

Maybe I should stream a YouTube video on the TV and try a speed test.