Netgear R7800 exploration (IPQ8065, QCA9984)

The interesting part here is that both 17.01 and master share the same compat wireless 2017.01.31, and in 17.01 @hnyman has applied calibration fix and newer wireless firmware. There’s not a big difference in wireless drivers between branches relating to general mac80211 and ath10k, so maybe this issue lies somewhere outside of wireless driver?
Another point is that different people with same hardware revisions receive different result - I have only 2.4ghz corrupted, while @Magnetron1.1 experience it on both bands

Just a wild guess, but could these commits be somehow related
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/drivers/net/wireless/ath/ath10k/htt_rx.c?h=v4.9.47&id=9e19e13261423eeb4398177001daa874c2128aa4
that is followed by
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.9.47&id=2f38c3c01de945234d23dd163e3528ccb413066d

Is anyone testing latest @hnyman ‘s build that includes all 3 fixes?

Yes. I'm still having MAC errors which can be fixed using ethtool and still experiencing poor Wi-Fi performance on the r4786 build. Are the 2 patches you mentioned yesterday incorporated in this build?

I’m talking about lede-r4773-12930fc045-20170831-rxring-buffersize-pcie-test
I don’t know if @hnyman included those patches into 4786, but anyway it’s not about corrupted frames, it’s about rx ring buffer corruption.

Meanwhile I’ve been talking to dd-wrt dev that had similar issue 2 years ago, he insisted on trying these changes no matter that we are experiencing it on wireless:
http://svn.dd-wrt.com/changeset/28281
http://svn.dd-wrt.com/changeset/28287

According to changes it does seem like packet alignment issue, so maybe reverting these also worths a try as a second option
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.9.47&id=9e19e13261423eeb4398177001daa874c2128aa4
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?h=v4.9.47&id=2f38c3c01de945234d23dd163e3528ccb413066d

I've been using LEDE firmwares for about a month and don't recall ever seeing the "rx ring buffer corruption" problem. On the r4773 build, I have seen where the ath10k_pci driver reported a firmware crash on the 5GHz radio, produced a firmware register dump, and several unsuccessful restarts on the radio. I think an iperf3 test was in progress at the time and hung. Rebooting the router solved the problem.

Hopefully, @brainslayer 's two patches (or their equivalent) will find their way into LEDE.

Nope, I’ve found out that these similar changes are already in our device tree, so that’s not the case...

For people looking for ethernet performance: backport this patch. Gives me ~20mbps on iperf. Not much but i'll take it.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c?h=next-20170905&id=6ad20165d376fa07919a70e4f43dfae564601829

Hi. Im trying to get back to stock Netgear Firmware from LEDE using TFTP and getting stuck. Using the instructions provided here, I was able to get the Router to switch to TFTP mode( I see the power LEDE flashing white), and can ping the router @192.168.1.1. When I try to transfer the Netgear firmware img file, my TFTP client in Windows shows that Flash has happened succesfully, but I don't see the router restarting. if I manually restart the router, it goes back to LEDE. Not sure how to fix this issue. Can someone please help? My LEDE configuration is also screwed up and its not able to connect to the internet. At this point, I just want to go back to stock.

If you still can access the LEDE either from LuCi GUI or from console, you can easily clear the faulty config to make the router look like just after the flash, default config. Just select reset from LuCi, or issue 'firstboot' command on console and reboot. That should clear the config.

Regarding TFTP, it has worked like a charm for me several times. Tftp completion is just the completion of the file transfer, but the actual flashing starts then and takes 1-2 minutes. The router should boot automatically.

Possible errors:

  • you have wrong image file and the flashing routine discards it
  • Your file transfer fails, e.g. you do not have a fixed IP address on the PC
  • you have not really triggered the tftp mode correctly. Flashing white light sounds correct, but still a possibility

I have used the tftp2 GUI tool from ddwrt site.

I recently bought TP-Link C2600 with IPQ8064.
I wanted to use it with my 250/20 Mb/s connection with SQM, yet for some reason this router can't properly utilise this connection. I've seen reports in this thread from other users saying something similar.

I also came across this post: Best Wireless AC router for LEDE?
If it's true then I'm hugely disappointed.

Users in this thread also indicated problems with SQM:
Post 386
Post 422

Can anyone advice me what should I do? I bought it from Amazon so I have 30 days to return it.

I don't have windows, but in case it's helpful, here is the exact command I ran in linux using tftp-hpa to flash stock firmware:

tftp -v -m binary 192.168.1.1 -c put R7800-V1.0.2.32.img

I got the file by downloading and unpacking this zip archive.

For the R7800, it takes a couple of minutes after the tftp succeeds for the router to flash and reboot the new firmware, so make sure you wait.

One weird thing I noticed is that flashing firmware by tftp doesn't completely wipe the device. In particular, when I did this there was still an old password on the router. So it seems holding down reset while booting is only for entering tftp mode, not for actually resetting the router. However, once the router has booted the new firmware, you can go back to factory settings by holding the reset button for 7-10 seconds.

I really like this aspect of the R7800, which essentially makes it very hard to brick. Once you've figured out the particulars of your tftp client. Note that the netgear page suggests putting in a username and password, which confused me because my tftp client doesn't do authentication, but it turns out you don't need that at all.

Try building a version with dissent1's fastpath patch applied.

was any successful in doing this? I'm looking for some performance statistics on exactly this. I rely on my router too much for day to day operation so binge flashing is out of the question for me...:wink:

Yep, been running a build with it applied for almost a month, to be fair, the R7800 is a pretty meaty device compared to some older routers and can push several hundred Mbps without fastpath, but with fastpath enabled I get 940Mbps lan to lan and ~900Mbps wan to lan (tested with two Macbook pro's and iperf3)

And worth adding, CPU usage during those tests is minimal, hardly changing from idle.

@hnyman:
I've been running the "r4831-9c500db896 / LuCI Master (git-17.259.19938-f36f198) " rxring test build and my 5GHz still crashes after a day.
My logs show a bunch of different error messages like this:
[ 166.408402] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[ 166.408994] br-lan: port 2(wlan0) entered blocking state
[ 166.413876] br-lan: port 2(wlan0) entered forwarding state
[39372.712158] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 0
[39372.712268] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 1
[39372.718791] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 2
[40594.341774] ath10k_pci 0000:01:00.0: rx ring became corrupted: -5

root@waringid:~# dmesg |grep ath10
[ 19.645227] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
[ 19.645311] ath10k_pci 0000:01:00.0: enabling bus mastering
[ 19.645772] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[ 20.218958] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[ 20.218992] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[ 20.230638] ath10k_pci 0000:01:00.0: firmware ver 10.4-3.4-00082 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 f301de65
[ 22.514471] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 751efba1
[ 28.368184] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[ 28.468977] ath10k_pci 0001:01:00.0: enabling device (0140 -> 0142)
[ 28.469116] ath10k_pci 0001:01:00.0: enabling bus mastering
[ 28.469914] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[ 28.644437] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[ 28.644486] ath10k_pci 0001:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[ 28.659643] ath10k_pci 0001:01:00.0: firmware ver 10.4-3.4-00082 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 f301de65
[ 30.939119] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 0:2 crc32 751efba1
[ 36.815095] ath10k_pci 0001:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[ 157.989696] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[ 157.989744] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[39372.712158] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 0
[39372.712268] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 1
[39372.718791] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 2
[40594.341774] ath10k_pci 0000:01:00.0: rx ring became corrupted: -5

I just found out 5g wireless download speed is very low.I only achieved 10MB/s with intel 8265ac in 80mhz mode.I have tried to transfer files to my iPhone and my android which gave me the same result.but the uploading is pretty fast.Uploading can achieve 50MB/s.I had tried to replace the qualcomm firmware with the ct firmware.It got a worse result(only 5MB/s downloading and 10MB/s uploading).

It seems that there is not a lot of support for the R7800 for either of the big open-source firmware communities. The DD-WRT build is still stuck on the 3.18 kernel and on LEDE the master builds have horrendously slow WiFi performance but nobody seems to be willing/able to address it.

I really regret getting this device.

Why not use the stable builds... Master is more experimental after all. No issues on stable for me in terms of WiFi. There is pretty good support for this unit, compared to a lot of other devices, I don’t know what you really expect. I mean do you really expect it to beat stock WiFi performance using proprietary drivers? LEDE 17.01.2 is pretty close... It’s popular at DD-WRT as well and they use and older kernel for a range of devices not just the R7800...

On stock it’s the best performing 5Ghz unit in terms of range and speed, on the consumer market.