Netgear R7800 exploration (IPQ8065, QCA9984)

Thank you for writing this down! A couple of R7800 owners are chasing down the cause of a serious instability behavior on some (not all) NSS enabled builds that are based on @ACwifidude ’s work. Specifically on the NSS builds based on kernel 5.10. We don’t know yet whether having NSS firmware active plays a major role in the instability some of us experience. But some are now trying to clamp down the CPU frequency to one setting and keep it there. As often, results vary, but going from random crashes after a few hours to an uptime of 5 days (and still counting) is a noticeable difference!

I’ve “summarized” my theory in a post in @ACwifidude’s build thread. Now, I believe when I read your discovery here that there really is an issue in frequency scaling and stability for (at least) the R7800 on kernel 5.10. But am I correct with my theory that this “bug” is new or more effective on kernel 5.10 and further, but had lesser impact on kernel 5.4?

5.4 also suffered from this problem and older build didn't had cache scaling or even cpufreq. Still to understand if this mismatched freq is a problem or not... for sure a common idea is that disable cpufreq makes the router more stable.

1 Like

You might take a look at the clock setup msgs early on in the kernel boot process (syslog) of @ACwifidude tree.

I remember seeing something interesting with warnings and assumptions made, but can't quite remember what.

1 Like

I pushed the additional patch here...

If someone want to test them... no idea if they will improve stability... they won't hurt for sure!

3 Likes

If I would git cherry-pick from that PR, will those apply to kernel 5.10 as well? Since the comment says
"5.15 refresh"?

This is based on underlying upstream changes upto 5.15, so trying the same stuff into 5.10 would fail.

The very first patch in that PR obviously precludes that option, ansuel has been very active on upstreaming ipq806x improvements - that makes backporting/ maintaining more than one kernel hard(er), while nothing is impossible, that onus would be on you.

2 Likes

Yeah, I was not sure but also afraid of that to be the end result... I'm in way over my head to get that working on 5.10. Better to check first than to assume.

Why try backporting to 5.10?
Just apply the PR as a patch and select to use 5.15 test kernel in menuconfig/.config and compile 5.15.
(download as one huge patch acceptable by patch: https://github.com/openwrt/openwrt/pull/10703.patch )

1 Like

I'm learning along the way, I'm using an NSS-build based that so far has only been ported to work with kernel 5.10. A few R7800's around the globe run an NSS-build and on kernel 5.10 it seems a lot more unstable than on kernel 5.4. Some of us have better results with stability if we can keep CPU frequency clamped to run always on a fixed frequency. I got the idea that the great work has been put in by Ansuel to get fixes for stability is done for 5.15 and therefore won't (easily) apply to 5.10, but that's were NSS is been ported to work. 5.15 and NSS work hasn't been started (yet).

Backport is possible... the krait drivers didn't change that much... fact is that 6.1 is around the corner and ideally 5.10 should be dropped if we decide to switch to 5.15.

This is why i'm stressing so much with "move the NSS project to 5.15"

3 Likes

We should in any case switch to 5.15, so that the next transfer to 6.x is easier.
In earlier years there were very painful kernel major version transitions, as too many version had been skipped.

2 Likes

yes considering had to switch to 5.15 when 22.03 was branched.. but want to test these new patches but no idea if they cause regression (real doubt are the regulator patch that on my case caused less upstream time)

Ok so any news with these new commits?

If you're asking me, sadly I'm not a developer. If I were to backport your efforts from 5.15 to 5.10 I'd run into a wall of my own frustration since I would not know what to do on merge or patch conflicts. Sorry... I can test all right, that's the contribution I can make. I've learned (a few) tricks on how to compile a build and now also know how to cherry-pick commits from other repositories. But if I need to select a different kernel version with a diffconfig, than I'm already lost :sweat_smile: .

i just need feedback on stability :smiley:

1 Like

I know :smile:

I'm building a 5.10 image right now. When that's done I can build a 5.15 image without NSS and your patches. Can I "just" clone your ipq806x-cleanup branch, update feeds, install feeds, and then get a .config (either via a diffconfig or make menuconfig) to select 5.15 as kernel?

--EDIT--
do I need to add something like this to an existing diffconfig I have to get a 5.15 build?:

CONFIG_LINUX_5_15=y

you should enable the experimental kernel build option in the menuconfig

Also in theory you should be able to apply my patch on top of master

I'll try a make menuconfig then, see what that looks like.
Applying your patches (all of them in that PR), is that a matter of cherry-picking them one by one with git? I don't know how to apply the huge patch that hnyman linked.

I've learned my tricks from this post after cloning a repository/branch:

./scripts/feeds update -a && ./scripts/feeds install -a && cp diffconfig .config && make defconfig && ./scripts/getver.sh

make -j5

--EDIT--
I've started make menuconfig and it's a bit overwhelming. I see a lot of stuff that I think I should select/enable, but I don't see "experimental kernel build" somewhere. If someone can give the the line I need to add to my diffconfig so 5.15 is built, I'm happy to build and test.

I've been where @D43m0n is now - trying to build my own versions of @KONG and @ACwifidude nss enabled trees.

The issue is needing a config file (set of files? Process?) that forces 'make menuconfig' start with the exact openwrt+kernel configuration in the tree.

I'm sure the info is in each tree, but the focus is usually on enabling specific features and I've found this overall starting point hard to reach.

@ACwifidude 1st 2 posts in his thread describe taking his sample diffconfig, modifying it to suit, and 'make defconfig' -ing to get a full .config file.

For us who'd like to help, that basic info would get us started; and for describing specific features to enable or disable, a path to the feature - from outermost to the feature - would help, even if it seems pedantic to the more experienced.