Well, the IPQ806x isn't alone. The Marvell mamba? have crashes/problems with 4.9 too:
It could that you (and others) are hitting a SoC or CPU errata. You could enable ARM_ERRATA_798181
and ARM_ERRATA_773022 in the kernel and test if it does helps or not, at least it's something easy to test.
But it might also do just be a waste of time.
I too have the RT-AC58U running on 4.14. But I can't realistically port the IPQ806X parts without the hardware.
@blogic what's your comment? Will you look into this issue? Or, could you make the ipq40xx its own target?
Because this way, the ipq806x and ipq40xx can have separate kernels... And a lot of grief because of interest- and potential merge-conflicts could be avoided in the future.
That's a result too. Did you check if ustream-ssl still messes up in the same way as well or did something change (for better or worse)? There are likely more erratas to test. Usually, the kConfigs description contains the necessary information to decide whenever they could apply to your device or not.
Your best bet would be to run git bisect, since as you said: the change happend recently.
If you want to do another "long shot", you could also play around with the custom board-2.bin again.
I've ran through ath10k commits and didn't notice any that would break 5ghz in addition to already broken 2.4ghz actually in my case.
I think I'll try my luck with k4.14 soon
Update: forgot to mention - I've already played with board bins recently - gpl and not, without any effect. I've even echoed 0 into pre-cal to check what happens.
FYI, that Marvell issue with the 4.9 kernel (which has been around forever and nobody was ever able to find/fix) was only with the WRT1900ACv1 devices. All other Marvell devices seemed to work fine with the 4.9 kernel. Unfortunately I had a v1 device and needed to run a custom build based on the 4.4 kernel to avoid the issue.
Ultimately, that issue is what brought me to buy the R7800.
i have this error... all normal?
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.278570] blk_update_request: I/O error, dev mtdblock0, sector 0
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.279108] blk_update_request: I/O error, dev mtdblock0, sector 8
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.284228] blk_update_request: I/O error, dev mtdblock0, sector 16
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.290371] blk_update_request: I/O error, dev mtdblock0, sector 24
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.296514] blk_update_request: I/O error, dev mtdblock0, sector 0
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.302214] Buffer I/O error on dev mtdblock0, logical block 0, async page read
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.406396] blk_update_request: I/O error, dev mtdblock0, sector 0
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.406421] Buffer I/O error on dev mtdblock0, logical block 0, async page read
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.412466] blk_update_request: I/O error, dev mtdblock1, sector 0
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.419285] blk_update_request: I/O error, dev mtdblock1, sector 8
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.425618] blk_update_request: I/O error, dev mtdblock1, sector 16
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.431774] blk_update_request: I/O error, dev mtdblock1, sector 24
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.437936] Buffer I/O error on dev mtdblock1, logical block 0, async page read
Wed Nov 22 17:52:43 2017 kern.err kernel: [ 14.447137] Buffer I/O error on dev mtdblock1, logical block 0, async page read
i tried different build and i keep crashing when i access to wifi...
now with lede stock stable looks for now it doesn't... any idea?
And also... why only 20/15mb space?? It have a 128mb flash where is all the space?
Ok i found this... i think it should be placed in the first post so that a new r7800 user underestand the small space... I brought this router for lede... not for that shitty netgear firmware (that from the source looks like based on openwrt....)
I don't think that these cause the regression you are suffering. In fact, these patches seem to be missing the ath10k part and it's kinda weird that they introduce a unused HW flag IEEE80211_HW_NEEDS_ALIGNED4_SKBS. There's no code that sets or checks the flag. From what I can tell, these patches do very little.
However, since you and others report that this problem started with (recent?) 4.9 and 4.4 with the same compat-wireless/backports don't experience the issue. I don't think this will help much at all. Again, the issue(s) might be hiding in plain sight, but without hardware I can't not really debug this myself. It could be lurking in the SoC (errarta, bad clocks) or a issue with pcie, ... At most I can tell about my experience based from what I know about IPQ4019 (which is a totally different platform) and the more recent experience from porting it to 4.14.
If you want to take another shot in the dark. Can you test if the iperf3 performance has stayed the same, or did degenerate as well between 4.4 and 4.9 as well, if you run a localhost loopback?
Thanks for the links.
To me it really does not make sense to not upstream anything.
That is classic corporate style,lets just fork everything and maintain our own fork which gets harder and harder to maintain as everything else moves forward and we have to backport.
The only people that have knowledge about the internals of the firmware and can fix ath10k-firmware issues are working for QCA in one way or another. From past experience, I can savely say they will not visit this forum... ever... not even if you provide them a direct link...
You should post this to ath10k@lists.infradead.org . However, you'll have to put some effort into the mail. Simply posting a dump will get you nowhere. You have to describe your setup a bit and include information about your own WIFI clients you use (and the ones in your vicinity). You have to make it crystal clear that this is a regression and it was working before in the same setup but with a different fw. Furthermore you'll have hope the issue gains some attention (like serious posts from other affected users and there are always some!). The bigger the commotion, the better! It would be best if you coordinate this issue with others.
Problem is that for now in master r7800 is completely broken...
Just look at bugs lede page... First bug sum up all the problems... I think we need to coordinate and create a big report to make it clear that there are lots of problem with this specific router.
So I've just compiled current trunk with k4.4 and the issue is gone.
I guess the only way to verify is to try to bisect when the issue was 1st introduced and confirm that it is k4.9 related, but not the corresponding lede patches to ipq806x or kernel
anyway for now i'm emailing with one guy and i'm testing some custom build firmware to fix a bug related to power save (sen him the bug report and told me that)
anyway it is too strange that with my pc i get an entire wifi crash and with my phone i get the rx corrupted ring
edit: actually... with this custom version i have very bad band in 5ghz (unless some random ring corruption sh*t) but it is very stable... no crash... luci still bugged anyway so i think we lost some packet on the way...
edit2: spoke too soon rx ring corrupted and wifi crashed... well one problem solved...