Netgear R7800 exploration (IPQ8065, QCA9984)

Here is the commit:

And here is the dicussion related to that, where you can see the testing work that I did to prove the return possiblity to the original OEM partition structure:
(messages 1118-1158 of this thread)

1 Like

Hi! I bought a R7800, and as I tried to set up OpenWrt I hit weird problems.

When I installed the fresh 19.07.5 I couldn't log in, I received the error below:

gismo:~> ssh root@192.168.1.1
ssh_dispatch_run_fatal: Connection to 192.168.1.1 port 22: error in libcrypto

After several retries the login succeeded. But ssh more often failed than succeeded, so I thought to try out 19.07.4 and see how it behaves there. Now the error very rarely came up - in fact only once in over 20 tries.

But now sometimes a package install or upgrade fails, and after 1 or more retries it succeeds. Below is an example:

root@OpenWrt:~# opkg install coreutils
Installing coreutils (8.30-2) to root...
Downloading http://downloads.openwrt.org/releases/19.07.4/packages/arm_cortex-a15_neon-vfpv4/packages/coreutils_8.30-2_arm_cortex-a15_neon-vfpv4.ipk
Segmentation fault
Collected errors:
 * opkg_install_pkg: Failed to verify the signature of /var/opkg-lists/openwrt_packages.
 * opkg_install_cmd: Cannot install package coreutils.
root@OpenWrt:~# opkg install coreutils
Installing coreutils (8.30-2) to root...
Downloading http://downloads.openwrt.org/releases/19.07.4/packages/arm_cortex-a15_neon-vfpv4/packages/coreutils_8.30-2_arm_cortex-a15_neon-vfpv4.ipk
Collected errors:
 * opkg_install_pkg: Failed to verify the signature of /var/opkg-lists/openwrt_packages.
 * opkg_install_cmd: Cannot install package coreutils.
root@OpenWrt:~# opkg install coreutils
Installing coreutils (8.30-2) to root...
Downloading http://downloads.openwrt.org/releases/19.07.4/packages/arm_cortex-a15_neon-vfpv4/packages/coreutils_8.30-2_arm_cortex-a15_neon-vfpv4.ipk
Configuring coreutils.
root@OpenWrt:~#

Could it be there's somewhere a fault in hardware?
Is there anything I can do about these weird problems?

Thanks
Aleš

A follow-up: I limited the max cpufreq to 1.4 GHz and now I cannot reproduce any problems anymore:

echo 1400000 >/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1400000 >/sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq

Should I re-seat the heatsink or something? :slightly_smiling_face:

You can keep an eye on the temps if you think it's overheating:
cut -c1-2 /sys/devices/virtual/thermal/*/temp

FWIW, mine runs between 50-60C (winter/summer) and I run it with performance governor at max 1.7 GHz.

Otherwise who knows, might just be a faulty CPU that can't handle 1.7 GHz or some other component.

If you suspect it's a hardware problem, it's not worth fighting it, I would look to replace it, reliability and data integrity is much more important.

1 Like

@hnyman i'm porting this device to kernel 5.10.... I notice something strange.
We have a -EIO error for the regulators.
Changes to the code in kernel 5.10 now actually enables the regulator with the OPP voltage function.

I backtracked the function to the rpm driver and the error EIO is about RPM REJECTING the request.
I also notice that the regulator just skip the call to rpm if the regulator is not enabled.

To sum up:

  • Now OPP enables the regulator on voltage set (if it's not enabled)
  • The regulator driver ignore the voltage set if it's not enabled (no error printed)
  • RPM reject the voltage set call.

I have some fear that the voltage scaling never worked o.o (as the regulators were never enabled and the voltage never set)
Do you have any idea how to test this?

Sorry, no.
I haven't done much DTS stuff, and haven't really looked into the OPP things.

But interesting if it has been unusable earlier (as the scaling itself works). Maybe we are then running it on full voltage always?

Mhhh or a random voltage set by the bootloader.
I'm playing a bit and by setting a voltage manually (by hardcoding the voltage driver rpm call) I can trigger a crash related to voltage. (Router freeze as soon as I set a very low voltage, for example lowest possible voltage for the regulator at 1.7 GHz)

Now considering the fact that bootloader In theory sets 800mhz as base clk and I think 1v (1000mv) as voltage, I'm starting to think that the router works for the only fact that we use only half of cores. (Nss cores disabled. And this would explain the random freeze now that we are using them)

7 Likes

Does someone remeber the public github repo where the r7800 source of uboot were present?
I'm curious to see what uboot does to the regulators...

static const int krait_needs_vmin(void)
{
	switch (read_cpuid_id()) {
	case 0x511F04D0: /* KR28M2A20 */
	case 0x511F04D1: /* KR28M2A21 */
	case 0x510F06F0: /* KR28M4A10 */
		return 1;
	default:
		return 0;
	};
}

static void krait_apply_vmin(struct acpu_level *tbl)
{
	for (; tbl->speed.khz != 0; tbl++) {
		if (tbl->vdd_core < 1150000)
			tbl->vdd_core = 1150000;
		tbl->avsdscr_setting = 0;
	}
}

cpuinfo

processor       : 0
model name      : ARMv7 Processor rev 0 (v7l)
BogoMIPS        : 6.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt                                vfpd32
CPU implementer : 0x51
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0x04d
CPU revision    : 0

processor       : 1
model name      : ARMv7 Processor rev 0 (v7l)
BogoMIPS        : 12.50
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt                                vfpd32
CPU implementer : 0x51
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0x04d
CPU revision    : 0

Hardware        : Generic DT based system
Revision        : 0000
Serial          : 0000000000000000

@hnyman
How we missed this o.o

My cpu should be 0x511F04D0 am i wrong?


Ok this is madness...

From some logs, it looks like the rpm starts to reject voltage command after Kernel Freeing Memory o.o
WTH


@robimarko do you have some experience with the rpm firmware? Did you ever experience a problem after the Kernel Free Memory Process?

1 Like

Hm,I really dont have any experience with RPM firmware as boards I work on dont use RPM to controll the regulators.

Maybe the firmware blob is too old?
What version do you have?
I can check what is currently shipped in QSDK

Actually i really don't know where it's grabbed. (embedded in the flash?)

Anyway the version is RPM firmware 3.0.16777364

The strange thing is as I said, the voltage request works at the start. As soon as kernel Free the memory, it doesn't work anymore.
Wonder if it's cause by other driver do some things with the regs.
(after the kernel free some driver enters another part of driver init?)

Hm, maybe its got something to do with the reserved memory.
Shouldnt RPM have its own reserved memory space?

Need to check this...
Also I print every regs the driver sets and they doesn't change before and after the free.

In the dts there is only smem and nss

Wonder if this is related?

	reserved-memory {
		rsvd@5fe00000 {
			reg = <0x5fe00000 0x200000>;
			reusable;
		};
	};

Anyway we never notice rpm broken for the fact that in the entire kernel, (if I'm not wrong) the rpm write function is used only in the regulator function and we never actually used the regulator.

Yeah, I checked the MSM downstream DTS and there is no RPM reserved memory node.
QSDK11.3 ships with RPM.AK.1.0-00170-C-1

So, you are seeing the qcom_rpm_write fail?

I give an extensive explanation in the nss topic... Anyway the rpm interface is a one way comunication, you send request, it does answer with an ack... so you know if it does reject or accept the command.

In my case it does accept the command before the free.
After the free (or i think the kernel start using some memory) the rpm start so send reject ack.
The regs are correctly set but the firmware rejects them

I'm a bit confused with the memory dts... Can you help me test with a very little memory (like 16-32mb) and the rest reserved... I would like to totally exclude a problem related to a reserved memory used by rpm that is causing some problem.

Anyway in my leaked repo i have this version RPM.AK.1.0-00168-C-1

Also as i thought I think the rpm firmware is embedded in the bootloader or some specific partition since it's not loaded by the kernel and the kernel driver directly read the rpm version from some regs

1 Like

Also @hnyman i'm dumping some mtd memory...
Did you notice that the mtd7 reserve partition just contains a tar with the language strings for the (i think) netgear webui ? I think that this is also recreated by the oem firmware...

Ok, will read through the NSS thread.
RPM firmware is loaded by the SBL and there is a partition for it always.
It's usually named 0:RPM or RPM simply.

Gotta dig out my ECW5410 to help testing.

The magic friend at netgear chose not not have shitload of partition and just put everything in qcadata
So for this specific router it's all packed up in one partition...
So it's in that.... Anyway the driver requires minimal change to check if the set is rejected or not. Thx a lot for the help.

There must be a script in the qsdk that have an option to create a single partition or split them.

Would be interesting to update the rpm firmware ahahha

I did not look to closely into that, as the Netgear partition already offered quite much space. The relevant discussion (also with you) and the links to the OEM source code about partition recreation are on the dozen messages starting with

And