Netgear R7800 exploration (IPQ8065, QCA9984)

Ok. I'll try. But judging from the output above, am I right in assuming that the only file I will (hopefully) need to manually patch is the config-4.4 file?

Yeah, seems so

Unfortunately, it doesn't compile for me:

/lede1701/build_dir/target-arm_cortex-a15+neon-vfpv4_musl-1.1.16_eabi/linux-ipq806x/shortcut-fe/sfe_ipv4.c:1366:5: error: 'struct sk_buff' has no member named 'fast_forwarded'

Does that look familiar?

Edit: I see that this error is because the patches in hack-4.4 hasn't been applied. I didn't manually copy the patches to the patches-4.4 directory, which I should have. I'll leave this post here anyway, even if it shows my utter lack of understanding... :frowning:

Edit2: It doesn't seem to affect my performance issue. If anything, it actually got worse (but that could be just a coincidence, though, as I didn't test much). Ah well, thanks for suggesting it anyway.

@avx @mroek @steom
You might be interested to test reverting the ath10k buffer reduction that was done in March in master. That might help with performance issues.

The background is that the ath10k buffer size reduction was introduced a bit sneakily into ipq806x with a commit improving support for QCA4019. (The commit title talks about QCA4019 but does not mention that ath10k buffers get reduced for all chips):
https://git.lede-project.org/?p=source.git;a=commit;h=cc189c0b7fa015978b04bb663a75b1da726376b5

I tried to initiate discussion about that action later, but that got no traction as there was no real proof that the buffer reduction caused harm in a significant way. If there would be proof, the action might hopefully be retracted.

I have made a R7800 test build from the current master that reverts the ath10k buffer size reductions:

Downloadable from my build's dir:

  • revert buffer size: lede-r4694-e7373e489d-20170811-ath10k-buffer-test
  • normal : lede-r4694-e7373e489d-20170811

Ps. If anybody wants to try the same in his own master build, it is just about deleting these two patches that were introduced by that commit:

package/kernel/mac80211/patches/960-0010-ath10k-limit-htt-rx-ring-size.patch
package/kernel/mac80211/patches/960-0011-ath10k-limit-pci-buffer-size.patch

I'll test it some time during the weekend, but I'm skeptical as to whether it will fix the issues. In my case, even just making changes to the 5GHz wifi settings would randomly crash the router completely (causing it to reboot). The buffer changes would most likely only affect stability while doing transfers, and shouldn't matter much when just poking around in the settings.

I couldn't wait, so I tested it just now. Bad news though, performance on wifi is still abysmal. I did the same test as before, and download speed was 20-30 Mbit/s on 5 GHz wifi. Upload speed was actually quite OK (better than before, and on par with stable), but just one time. I repeated the test, but when upload was about to start, something went wrong. The router didn't crash, but the phone lost wifi connectivity and the upload was aborted. The log had this:

Fri Aug 11 21:51:33 2017 kern.warn kernel: [ 273.360252] ath10k_pci 0000:01:00.0: rx ring became corrupted: -5

So as far as I'm concerned, wifi is useless in master, both with and without those two patches.

I posted a new thread about the multicast performance issues I'm seeing, and I would appreciate it if anyone could help me diagnose that issue. Everything is now working correctly (after I fixed the bug with the query messages), except for the performance issue where the router either drops or reorders the multicast UDP packets.

Hi,
I have installed latest hnyman build r4694 with virtually all default settings and then scanned my system in the Shields Up service
https://www.grc.com/x/ne.dll?bh0bkyd2
And I got following results

NO PORTS were found to be OPEN. Ports found to be STEALTH were: 25, 80, 135, 137, 138, 139, 445, 543 Other than what is listed above, all ports are CLOSED. TruStealth: FAILED - NOT all tested ports were STEALTH, - NO unsolicited packets were received, - A PING REPLY (ICMP Echo) WAS RECEIVED.

Please advice, is this state safe enough or I should to close or hide those ports according to their recommendations?

Just follow this:

1 Like

This has nothing to do with R7800, but with firewall in general. So, wrong discussion thread...

You already have all ports closed (or dropping traffic). No traffic gets through.

You might read wiki discussion about the stealth "DROP" or closed "REJECT":
https://lede-project.org/docs/user-guide/firewall_configuration#implications_of_drop_vs_reject

@hnyman
Hi, when you upload new builds in your dropbox, where can I see what was changed compared with previous version?

Is it in *-status.txt file?

Usually there are no changes from me, but just the global changes in main sources and feeds like Luci and packages. You need to check the changelings in those repos.

1 Like

Hi Hyman, following you from the beginning with WNDR3700v2 and now decided to follow also with R7800 which I bought a few days ago.
I want to thank you for the great work you are doing on this router.
I'm successfully compiling your build for R7800 LEDE snapshots and successfully upgraded the firmware from stock to LEDE without problems.
Unfortunayely having problems with leds. In fact 2ghz and 5ghz are wrongly driving the wifi on/off and wps leds instead of the rigth ones.
I read the thread and at the beginning was said that your build has a workaround for this problem, but it seems not to have worked for me. I obviously followed all steps for the building of the the compiling environment.
Can you please explain how to apply the workaround so that it will stay there each time I build a new release?
Thanks in advance for your kind help.
EDIT: I misunderstood the workaround. I realized that 2 and 5ghz leds still not being supported by current drivers.

Good that you figured it out.
Sadly the proper wifi LEDs in R7800 still can't be controlled by opensource ath10k drivers, so I use the wifi on/off and wps LEDs as the workaround to have at least some wifi activity indication.

@hnyman @mroek @tetsuo55 wonder if it fixes rx ring buffer corruption
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=f35a7f91f66af528b3ee1921de16bea31d347ab0

Interesting. Can we get this added to the master tree?

2 Likes

@Magnetron1.1
Could you post syslog output of ath10k wireless cards initialisation, maybe there are different revisions

@hnyman
Hi,i download "R7800-lede-r4751-4b3ffecf2b-20170828-1827-sqfs-sysupgrade.tar" from you dropbox. but 5G wireless still has problems.

[ 59.884126] br-lan: port 2(wlan0) entered blocking state
[ 59.889192] br-lan: port 2(wlan0) entered forwarding state
[ 1990.348762] ath10k_pci 0000:01:00.0: rx ring became corrupted: -5
[ 4939.372771] device wlan0 left promiscuous mode
[ 4939.372871] br-lan: port 2(wlan0) entered disabled state
[ 4944.432613] ath10k_pci 0000:01:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
[ 4944.473604] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[ 4944.473649] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[ 4944.479421] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 1
[ 4944.818420] ath10k_pci 0000:01:00.0: firmware crashed! (uuid 3fb1a044-2ae6-4e78-93fc-efa57c4eb515)
[ 4944.818456] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[ 4944.826344] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[ 4944.838678] ath10k_pci 0000:01:00.0: firmware ver 10.4-3.4-00082 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 f301de65
[ 4944.844935] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 751efba1
[ 4944.857842] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 512 raw 0 hwcrypto 1
[ 4944.877168] ath10k_pci 0000:01:00.0: failed to get memcpy hi address for firmware address 4: -16
[ 4944.877192] ath10k_pci 0000:01:00.0: failed to read firmware dump area: -16
[ 4944.885076] ath10k_pci 0000:01:00.0: Copy Engine register dump:
[ 4944.891698] ath10k_pci 0000:01:00.0: [00]: 0x0004a000 3735928559 3735928559 3735928559 3735928559
[ 4944.897665] ath10k_pci 0000:01:00.0: [01]: 0x0004a400 3735928559 3735928559 3735928559 3735928559
[ 4944.906692] ath10k_pci 0000:01:00.0: [02]: 0x0004a800 3735928559 3735928559 3735928559 3735928559
[ 4944.915533] ath10k_pci 0000:01:00.0: [03]: 0x0004ac00 3735928559 3735928559 3735928559 3735928559
[ 4944.924405] ath10k_pci 0000:01:00.0: [04]: 0x0004b000 3735928559 3735928559 3735928559 3735928559
[ 4944.933247] ath10k_pci 0000:01:00.0: [05]: 0x0004b400 3735928559 3735928559 3735928559 3735928559
[ 4944.942046] ath10k_pci 0000:01:00.0: [06]: 0x0004b800 3735928559 3735928559 3735928559 3735928559
[ 4944.950960] ath10k_pci 0000:01:00.0: [07]: 0x0004bc00 3735928559 3735928559 3735928559 3735928559
[ 4944.959800] ath10k_pci 0000:01:00.0: [08]: 0x0004c000 3735928559 3735928559 3735928559 3735928559
[ 4944.968672] ath10k_pci 0000:01:00.0: [09]: 0x0004c400 3735928559 3735928559 3735928559 3735928559
[ 4944.977516] ath10k_pci 0000:01:00.0: [10]: 0x0004c800 3735928559 3735928559 3735928559 3735928559
[ 4944.986385] ath10k_pci 0000:01:00.0: [11]: 0x0004cc00 3735928559 3735928559 3735928559 3735928559
[ 4945.035173] ath10k_pci 0000:01:00.0: cannot restart a device that hasn't been started
[ 4951.251155] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 7
[ 4951.251182] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 1
[ 4951.257687] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 2
[ 4951.265217] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 8
[ 4951.272777] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 9
[ 4951.280269] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 11
[ 4951.287897] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 12
[ 4951.295426] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 14
[ 4951.303087] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 15
[ 4951.310663] ath10k_pci 0000:01:00.0: received tx completion for invalid msdu_id: 16
[ 4951.474607] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 4951.498492] br-lan: port 2(wlan0) entered blocking state
[ 4951.498544] br-lan: port 2(wlan0) entered disabled state

@hnyman do you have buffer sizes restored in that build?
@tetsuo55 is running my build for more than 7 days without issues already, while before it has been 1-2 days till crash