Netgear R7800 exploration (IPQ8065, QCA9984)

I am doing it really similarly and RPM write will fail for s2a with -5 meaning it rejected the call.
0.8V is off course way too low when running at max OPP frequency.

Its like you said, until freeing memory it works.

1 Like

Yep nice you reproduced the problem!

Also another fun thing is that I notice that sometime if the voltage is hardcoded and always the same for some reason It happened sometimes that rpm accepts and starts to fail as soon as i try to use another voltage.

In theory the fact that it doesn't work after the free means that the system starts to use some part of the mapped memory and the rpm firmware crash???

Wonder if we can reproduce this by calling another command to the rpm and check if it does reject every command or some specific one.

Actually now that i think that would be a nice idea... Does the rpm firmware crash or it does just reject some specific command?

Actually we have a missing command and from qsdk we have the script to parse the rpm log....
That could actually be useful!
@robimarko do you think it would be worth to add the port the missing rpm-log driver? (and try to export that data and use the script to parse it?)

I can only reproduce this sometimes.
For whatever reason if I use the default 800000 uV it will hang while if I used the next step then it will suceed in changing voltage once and then it will crash the kernel.

So its really intermittent.

Consider the fact that funny things can happen with low voltage... I also tried with some theory like:

  • The regulator can only set a voltage greater than the previous
  • The regulator need to be set back to a safe value before changing the voltage

It's all random and I notice the rule that it just brake 99% after the free...

Also using some intermediate voltage would also cause some problem with high frequency... (hardcode voltage to a value less than max is bad since we still have the frequency scaling enabled) One time I had the fs reporting some crc error in the nand and it was all caused by the cpu working at a frequency with less voltage than required. (max freq at 1050000 v hardcoded)

So the rpm firmware is definitely working... and the kernel crash is the proof

Talking of funny things ... looking at some images of the R7800's board, I just noticed something strange. Up until now my assumption was that there are 2 configurable dual-channel power regulators: s1 with channels a (L2 cache) and b (NSS cores) and s2 with channels a (CPU0) and b (CPU1). Well, on the board there is only one dual channel regulator and one single channel one. Of course there are others, but only those 2 which look like they are SMB208s or similar. So it looks like either on of the power rails is supplied by another, non-programmable regulator, or 2 rails are connected and supplied by a common regulator.

for the s1 in the original code there is some logic for the voter and the voltage was set based on the request of both cpu cache, fabric, nss core and other thing... so I think s1 is single and rpm chose the voltage based on the request and the other is for the cores.

Also @reka can you post the image you are watching? i'm curious

This shows the area below the SOC (IPQ8064 in this case since it's a D7800 but the board is the same for R7800 and R7500v2).

Hello,
is it confirmed that

  • Flashing via OEM GUI works <? or it is still > (questionable, you might need TFTP)?

Also i was told that this fork https://github.com/ACwifidude/openwrt/tree/kernel5.4-nss-qsdk10.0/bin/targets/ipq806x/generic/R7800-20201231-MasterNSS-factory.img has much more better ethernet performance (~1gbit) compare to the original master.

I plan to use it as Dumb access point.

Is there any suggestion which channel are the best for 2.4 and 5 Ghz in EU?

Thank you

Flashing via gui works and NSS build allows for full line rate (940mbps). There is some wifi offloading so it should help with wireless performance too.

I’d pick either low or high non-DFS channels for your country code (whichever is the opposite of any neighbors you have)

Enjoy!

@ACwifidude appreciate, is there some nice thread/howto regarding these channels? by opposite you mean? thanks

@ACwifidude can you test something? try to do custom build with the nss core frequency set to 600000 or the second freq

I’ll have some free time to build on Friday and test. Want me to set it to that frequency for ipq8064 devices too to get a bigger testing pool?

Yes obviously the Freq is different.

1 Like

@robimarko was thinking... what if the rpm is reset from some interrupt or device (gmac?) and it does work before the kernel free because he is using the old settings from the bootloader?

I'm analyzing the code and i notice that the rpm regulator code never changed from the old first implementation in kernel 3.18 (that also i really don't know how it was pushed since the rpm write function was completely missing and the driver had a not defined function)

Hm, there is a RPM reset available in the reset driver, named RPM_PROC_RESET.
But its not used at all in the DTS, I can see whether its triggered at all and we can actually try using it to reset the RPM before using it.

Let me check if the original code have that somewhere. and try to implement something...
Would be interesting if it does start to work after the reset. (would confirm that something screw it after some time)

In qsdk it's only defined but never used... Wonder if asserting it just reset the rpm interface

(anyway some ot thing... I found the partition scheme netgear should have used... so in theory we should be able to upgrade the rpm firmware)

No idea, I presume it should reset the RPM firmware.

Anyway just to confirm... I analyzed the netgear source just to make sure we were not searching a fix for a disabled thing.
I can confirm that the clock driver use the rpm regulator write function from the regulator driver and change the voltage based on the core clock pvs.
The cache regulator change the voltage based on various voter (fabric, cache and other internal device). So in the original code there is no logic or connection between the various regulator. The original code scaled them independently from one another.

1 Like

Well... funny discovery... asserting the RPM_PROC_RESET just reset the router to the bootloader :frowning:

Also some log

[    3.330665] sdhci-pltfm: SDHCI platform and OF driver helper
[    3.339052] NET: Registered protocol family 10
[    3.342771] Segment Routing with IPv6
[    3.345128] NET: Registered protocol family 17
[    3.349461] 8021q: 802.1Q VLAN Support v1.8
[    3.353107] Registering SWP/SWPB emulation handler
[    3.393593] qcom_rpm 108000.rpm: RPM firmware 3.0.16777364
[    3.412100] s1a: Bringing 0uV into 1050000-1050000uV
[    3.412674] RPM SUCCESS TO 1050000
[    3.413084] s1a: supplied by regulator-dummy
[    3.419670] s1b: Bringing 0uV into 1050000-1050000uV
[    3.424185] RPM SUCCESS TO 1050000
[    3.424526] s1b: supplied by regulator-dummy
[    3.432142] s2a: Bringing 0uV into 775000-775000uV
[    3.437445] RPM SUCCESS TO 775000
[    3.437796] s2a: supplied by regulator-dummy
[    3.444576] s2b: Bringing 0uV into 775000-775000uV
[    3.449907] RPM SUCCESS TO 775000
[    3.450253] s2b: supplied by regulator-dummy
[    3.794194] libphy: dsa slave smi: probed
[    3.798517] qca8k 37000000.mdio-mii:10: configuring for fixed/rgmii-id link mode
[    3.800358] qca8k 37000000.mdio-mii:10: nonfatal error -22 setting MTU on port 1
[    3.809816] qca8k 37000000.mdio-mii:10: Link is Up - 1Gbps/Full - flow control off
[    3.815659] qca8k 37000000.mdio-mii:10 lan1 (uninitialized): PHY [dsa-0.0:01] driver [Generic PHY] (irq=POLL)
[    3.821716] qca8k 37000000.mdio-mii:10: nonfatal error -22 setting MTU on port 2
[    3.835623] qca8k 37000000.mdio-mii:10 lan2 (uninitialized): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
[    3.838768] qca8k 37000000.mdio-mii:10: nonfatal error -22 setting MTU on port 3
[    3.852863] qca8k 37000000.mdio-mii:10 lan3 (uninitialized): PHY [dsa-0.0:03] driver [Generic PHY] (irq=POLL)
[    3.855925] qca8k 37000000.mdio-mii:10: nonfatal error -22 setting MTU on port 4
[    3.870233] qca8k 37000000.mdio-mii:10 lan4 (uninitialized): PHY [dsa-0.0:04] driver [Generic PHY] (irq=POLL)
[    3.873200] qca8k 37000000.mdio-mii:10: nonfatal error -22 setting MTU on port 5
[    3.888629] qca8k 37000000.mdio-mii:10 wan (uninitialized): PHY [dsa-0.0:05] driver [Generic PHY] (irq=POLL)
[    3.894110] ipq806x-gmac-dwmac 37200000.ethernet eth0: error -22 setting MTU to include DSA overhead
[    3.898949] DSA: tree 0 setup
[    3.916809] UBI: auto-attach mtd13
[    3.916870] ubi0: attaching mtd13
[    3.920566] RPM SUCCESS TO 875000
[    3.921302] RPM SUCCESS TO 875000
[    3.930797] RPM SUCCESS TO 1150000
[    3.939691] RPM SUCCESS TO 1150000
[    3.958096] RPM SUCCESS TO 1075000
[    4.029764] RPM SUCCESS TO 1150000
[    4.549708] random: crng init done
[    4.568851] RPM SUCCESS TO 1075000
[    4.608917] RPM SUCCESS TO 1150000
[    5.039847] RPM SUCCESS TO 1075000
[    5.108898] RPM SUCCESS TO 1150000
[    6.458766] RPM SUCCESS TO 1075000
[    6.469746] RPM SUCCESS TO 1150000
[    6.958836] RPM SUCCESS TO 1075000
[    6.978800] RPM SUCCESS TO 1150000
[    8.329846] RPM SUCCESS TO 875000
[    8.481846] ubi0: scanning is finished
[    8.496072] ubi0: attached mtd13 (name "ubi", size 96 MiB)
[    8.496103] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[    8.500541] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[    8.507309] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[    8.514257] ubi0: good PEBs: 772, bad PEBs: 0, corrupted PEBs: 0
[    8.521022] ubi0: user volume: 2, internal volumes: 1, max. volumes count: 128
[    8.527189] ubi0: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 1599573772
[    8.534298] ubi0: available PEBs: 0, total reserved PEBs: 772, PEBs reserved for bad PEB handling: 20
[    8.543707] ubi0: background thread "ubi_bgt0d" started, PID 72
[    8.544643] block ubiblock0_0: created from ubi0:0(rootfs)▒[    8.642251] Freeing unused kernel memory: 39936K
[    8.642584] RPM REJECT TO 1000000
[    8.642648] Run /init as init process
[    8.644584] RPM REJECT TO 875000
[    8.650954] RPM REJECT TO 875000
[    8.652836] cpu cpu0: _set_opp_voltage: failed to set voltage (875000 875000 875000 mV): -5
[    8.659472] cpufreq: __target_index: Failed to change cpu frequency: -5
[    8.667443] RPM REJECT TO 1000000
[    8.667484] cpu cpu1: _set_opp_voltage: failed to set voltage (1000000 1000000 1000000 mV): -5
[    8.669234] RPM REJECT TO 875000
[    8.677558] cpufreq: __target_index: Failed to change cpu frequency: -5
[    8.679321] RPM REJECT TO 875000
[    8.689506] cpu cpu0: _set_opp_voltage: failed to set voltage (875000 875000 875000 mV): -5
[    8.695848] RPM REJECT TO 1150000
[    8.700939] RPM REJECT TO 1150000
[    8.707264] cpu cpu1: _set_opp_voltage: failed to set voltage (1150000 1150000 1150000 mV): -5
[    8.707374] cpufreq: __target_index: Failed to change cpu frequency: -5
[    8.710846] cpufreq: __target_index: Failed to change cpu frequency: -5
[    8.718750] RPM REJECT TO 1075000
[    8.730781] RPM REJECT TO 1150000
[    8.737255] RPM REJECT TO 1150000
[    8.739223] cpu cpu1: _set_opp_voltage: failed to set voltage (1150000 1150000 1150000 mV): -5
[    8.740598] RPM REJECT TO 1075000
[    8.745703] cpu cpu0: _set_opp_voltage: failed to set voltage (1075000 1075000 1075000 mV): -5
[    8.745745] cpufreq: __target_index: Failed to change cpu frequency: -5
[    8.754338] cpufreq: __target_index: Failed to change cpu frequency: -5
[    8.766203] RPM REJECT TO 1075000
[    8.774360] RPM REJECT TO 1075000
[    8.779320] cpu cpu1: _set_opp_voltage: failed to set voltage (1075000 1075000 1075000 mV): -5
[    8.785988] cpufreq: __target_index: Failed to change cpu frequency: -5

Still not sure but the problem is related to memory... As it can correctly set the regulator with not problem...

1 Like