Netgear R7800 exploration (IPQ8065, QCA9984)

anyway they few discrepancy from the original code are:

  • missing memory block (mb) in krait_set_l2_accessor (can be cause of problems in some very rare case) (before any write)
  • missing 10 us wait after regulator change (can't be fixed in cpufreq think we should handle this in the regulator driver)
  • not optimized safe selection logic... can improve freq speedup but makes the problem more present
  • on 5.4 we had a patch about a bug in mux divisor... this patch was dropped later... i had the assumption that the hfpll div was never used but it seems it is actually? (by checking what value are written to the mux) (need more investigation and I need to check if the bug is still there)

PR 10703 on top of master at commit b77217d9 (includes kernel 5.15.68) builds, loads, and runs on my r7500v2.

I no longer see the wifi speed reduction regression I observed with earlier kernel 5.15 tests (the wifi issue occurred on both swconfig and DSA so I don't think this was DSA related).

@D43m0n I also observe loss of usb function with kernel 5.15. I've verified usb is working on the equivalent 5.10 build.

on both port? with this pr or plain 5.15?

1 Like

ya, on both port. With pr 10703. I assume by "plain 5.15" you mean master (without pr 10703) but enabling the testing kernel option which I have not tried recently but I could if it will help.

yes with plain 5.15 i mean without the pr

might take up to 24 hours to get back to you.

I also noticed you disabled cpuidle spc for r7500v2 (i saw it in the dts and confirmed it's off). No worries - I don't think I can show this has any benefit for my device. Did you implement a proper different WFI?

1 Like

still now work for wfi... disabling it for r7500 may be just a typo in the dtsi cleanup

About the usb!!! I KNOW WHY. It's very easy as per kernel review usb node should be enabled by the device and not in the dtsi soooo i will fix that :smiley: and while at it also check wfi

3 Likes

wfi is currently indicated, spc is disabled (not that you need reminding, but be careful to keep spc disabled for r7800).

ideally spc is fixable for r7800 :smiley:

@nmrh some bad news for spc... to mee it seems a feature that should not be enabled... we can consider an improved WFI tho... but from what i can see in the r7800 source standalone power collapse is not enabled in the firmware. (I added support for it and I will consider adding a patch for it)
But IMHO power retention and power collapse should not be used for me.

Also I updated the pr with enabled usb node.

1 Like

USB works again in R7800 :wink:

2 Likes

By "me" do you mean r7800 (ipq8065)? If so, what about ipq8064 systems?

For the record, I don't understand what the spm driver is doing for ipq8064 systems other than enabling it seemed to bring back WFI and enable SPC cpuidle functions on my r7500v2 when going from kernel 4.14 to 4.19+ (AFAIK SPC was not enabled on or before 4.14 kernels).

Since then, your knowledge about ipq806x systems, in particular the cpu frequency and clock function, has increased considerable so I will defer to your better judgement (even if there is still some learning to be done i.e. making mistakes). You expressed some concern that the spm module isn't doing anything or worse could be causing issues. If that's possible, then I'm for trying something different.

Thanks for taking a look.

BTW do you still want me to try a build w/o pr 10703?

my theory is that ipq8064 is a recycled apq8064 with different stuff and apq8064 is recycled msm soc.
So it can be that spm works on ipq8064 out of luck due to similarities with apq8064. That or the rpm firmware just supports that and on some soc of ipq8065 is just broken...

Anyway again standalone power collapse IMHO should not be used on a router... we should just use an improved version of the WFI. But again no documentation and it's all really confusing... So I can be totally wrong about the topic... Also on top of that saw2 values for the spm driver are wrong :smiley: since they differ from boot and non boot cpu...

No idea if it's problematic or not... probably spm doesn't work on non boot for this reason.
About 5.15 if it was just to test usb then we don't need that as I fixed the problem :smiley:

1 Like

Then I'm happy to leave it disabled.

I pulled in the latest commits to pr 10703 via git pull upstream pull/10703/head but ended up with failed patch:

|diff --git a/drivers/soc/qcom/spm.c b/drivers/soc/qcom/spm.c
|index 484b42b7454e..d822ea6dee38 100644
|--- a/drivers/soc/qcom/spm.c
|+++ b/drivers/soc/qcom/spm.c
--------------------------
No file to patch.  Skipping patch.
2 out of 2 hunks ignored
Patch failed!  Please fix /home/sn/openwrt/target/linux/ipq806x/patches-5.15/122-01-soc-qcom-s
pm-Add-ipq8064-CPU-data.patch!

I am interested to try it, but if it's not ready yet I can revert this commit.

EDIT: almost 2 days uptime on pr10703 and no issues aside from usb. Built and loaded the updated pr10703 (sans commit 0df3d53), so far so good, usb also working on the r7500v2.

While I am definitely interested to try alternate WFI cpuidle implementations, I am wondering about how to evaluate/compare them. If there is clearly a better implementation simply based on "code" that does no harm, that's fine. If not and there is no way to evaluate differences, I'm not sure there is much value to spend a lot of time on it.

Moving to DSA might be more interesting to others assuming it's time to pick that up again.

@nmrh wfi changes should be safe... also as said in the other topic to make a good comparison we need to know the value of the uptime before the pr

1 Like

I think that you should sooner or later implement the changes, so that we get a larger testing population. Possibly piece by piece, if you feel that most changes are safe but some are still uncertain.

Master is for development, after all.

We should not wait until everything is perfect, and all various changes of a really large PR are validated.

Ps. I have had roughly one crash per year (running mainly master, but sometimes 22.03 and 21.02), so R7800 has been really stable so far. Currently running the USB-fixed 5.15 and looks good so far.

1 Like

I just want to wait a bit to don't push changes that would make the device reboot every 2 days... again i experienced this but my device was actually defected so not a good proof that the patches caused regression :smiley:

If this test goes well I would just push all the commit and move the target to 5.15 by default.

1 Like

Since I've been on kernel 5.10 (AP only), I've had 20-100+ days uptime (153 days with 5.10+DSA). The only reasons they have not been longer are my updating the device or power outages. I don't think I've observed a crash in more than a year.

1 Like

I had an unexpected reboot yesterday :roll_eyes: I updated your PR with the ramoops files. I will continue to run this build with your PR until end of this week/weekend. If reboots come too often I will flash another build or clamp CPU frequency down. Perhaps this latest measure will be sufficient enough when looking at the contents of the ramoops files.

2 Likes