Netgear R7800 exploration (IPQ8065, QCA9984)

If you want to implement this, then you can add the following to /etc/config/network and it disables that script:

config globals 'globals'
        option default_ps '0'

The above config is handled by these two lines in the hotplug script:

default_ps="$(uci get "network.@globals[0].default_ps")"
[ -n "$default_ps" -a "$default_ps" != 1 ] && exit 0

Sorry, one thing i'm not following is i understand how to add it to /etc/config/network by ssh CLI but how do i do this and keep this running after a reboot?
Is it by running these commands from LuCI System / Startup at the very bottom of the page and type this?

Run this in commands via SSH CLI if want to use this once, correct? but not stick as it'll revert itself after reboot.

Sh /etc/config/network
config globals 'globals'
        option default_ps '0'

Run this in LuCI System / Startup at the very bottom of the page and type this so i can have this always perform these commands automatically right after reboot? correct?

/etc/config/network
config globals 'globals'
        option default_ps '0'

Also i don't need to touch the two lines in the hotplug script correct?
The reason why you left the hotplug script there is just to let me see whats going on, correct?
Or do i have to also type that in?

Thanks again!

You need to add the following two lines:

config globals 'globals'
        option default_ps '0'

to /etc/config/network and it will survive a reboot. No other changes are necessary.

EDIT:** Just tried to

Sh /etc/config/network 
config globals 'globals'
        option default_ps '0'

Didn't work. How should i do this? Says line config not found.

Oh okay, thank you.

is there any command to check it afterwards? Thanks again!

Do the official and ct firmwares share the same driver and board bin file?

Just to let you know.
Some time ago while browsing through QSDK I've noticed a comment that l2 cache on ipq806x cannot operate properly at rate of 1ghz+ when either main core operates at 384mhz. That's why QSDK sets min rate of 800mhz .
It could be the source of performance problems on ipq806x in openwrt.
I suggest to pick my PR that adds fabrics and proper l2 scaling and adjust it to be accepted taking into account the forementioned l2 issue.

6 Likes

Do you think this can all be worked around by setting scaling governor to performance and running without frequency scaling altogether?

You can workaround it just by setting minimum scaling frequency to 800mhz while on ondemand governor.
But fabric clock is not being handled at all in current setup.

Unfortunately, you PR does not merge cleanly any more. I tried a few times this year, but every time the baseline code wold change a bit and I do not understand enough about this stuff to rebase it...

I may be wrong, but the fabric clocks seems to be used by the NSS firmware only. They’re not used by any of the existing codes in the Linux kernel as far as I can tell. That’s why the RPM clocks are never initialized. I concluded this from my tinkering with the NSS drivers.

The NSS drivers would adjust the fabric clocks when it changes the NSS core clocks. The code is available from the QSDK Git repo. I’ve also included it in my NSS driver Git repo.

Apart from nss fabrics there're also ddr and apps fabrics, not sure that those are not used anyhow. I understand what you are talking about, but QSDK scales those fabrics according to CPU cores, not nss cores. Only nss fabric is scaled according to nss cores afaict

1 Like

The NSS drivers initializes the app and ddr fabric clocks and scales them. When the NSS core clocks are adjusted, the app and ddr fabric clocks are also adjusted. Without the NSS drivers active all those RPM clocks are never initialized. That’s why my conclusion is that they are used by the NSS firmware only. Without proper documentation, it’s hard to be sure tho.

In any case, the OpenWRT builds are working fine without those clocks initialized, so it’s likely never needed by standard Linux drivers in OpenWRT.

Edit:
The clocks used by the NSS drivers are listed here:

1 Like

Can't agree with you completely regarding apps and ddr fabric, can only say that qsdk connects fabric clocks with CPU rates, but what happens under the hood knows only the one who has the datasheet.

	case PHY_INTERFACE_MODE_RGMII:
		nss_gmac_ctl_val = (GMAC_PHY_RGMII | GMAC_IFG |
				GMAC_IFG_LIMIT(GMAC_IFG));
		nss_eth_clk_gate_ctl_val =
			(GMACn_RGMII_RX_CLK(gmac_idx) |
			 GMACn_RGMII_TX_CLK(gmac_idx) |
			 GMACn_PTP_CLK(gmac_idx));
		setbits_le32((NSS_REG_BASE + NSS_GMACn_CTL(gmac_idx)),
				nss_gmac_ctl_val);
		setbits_le32((NSS_REG_BASE + NSS_ETH_CLK_GATE_CTL),
				nss_eth_clk_gate_ctl_val);
		setbits_le32((NSS_REG_BASE + NSS_ETH_CLK_SRC_CTL),
				(0x1 << gmac_idx));
		writel((NSS_ETH_CLK_DIV(1, gmac_idx)),
				(NSS_REG_BASE + NSS_ETH_CLK_DIV0));
		break;

You're right. I checked the QSDK codes again. The app and ddr fabric clocks are scaled when CPU freq changes. My codes for the NSS drivers only scaled the app and ddr fabric clocks when the NSS core clock changes. I missed out the codes to scale the fabric clocks when CPU clock changes.

I think this could be why I see my R7800 rebooting randomly when the NSS core clock frequency changes. Most likely the Krait CPU clocks don't agree with the app & ddr fabric clocks and hanging the Krait core, which then likely triggers some watchdog timer to reboot the router.

I'll incorporate the fabric clock scaling code into the Krait CPU freq. change code and see if that solves the random reboot issue when NSS core clock changes.

2 Likes

I believe it's not the cause of your issue, I believe those clocks are running at nominal rate of 400mhz when are not scaled by any driver and that should be enough for normal operation.
Have you tried limiting krait minimal freq to 800 MHz, so at least l2 cache wouldn't fail?
Anyway I think it's more like a bug that triggers your hangs that maybe caused by race condition that is not being taken care of?

Yes, I did that. It didn't help. The router still randomly reboots when the NSS core clocks are scaled. The only way to make the router stable is to force the scaling to stop, i.e. make the NSS core run at a fixed core frequency. My R7800 has been up more than a week without any apparent issue :slight_smile: with the NSS cores accelerating routing tasks.

1 Like

Well, it worked this time: https://gist.github.com/fantom-x/379d158f5395dacf78e07955f593192c. I only had to change three lines in @dissent1'1 patch, but I remember that I had way more issues trying this before; very weird.

There are four patches in there that need to be applied in order to 19.07. When the build applies the changes, it is not 100% clean but it works. The router booted fine just a few minutes ago.

krait_l2_pri_mux/clk_rate == 1200000000 did not change, but maybe because the router was already running with the performance governor. regulator_summary values did change as per Netgear R7800 exploration (IPQ8065, QCA9984).

regulator_summary (before)
cat /sys/kernel/debug/regulator/regulator_summary
 regulator                      use open bypass voltage current     min     max
-------------------------------------------------------------------------------
 regulator-dummy                  0   10      0     0mV     0mA     0mV     0mV 
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    s1a                           0    0      0  1050mV     0mA  1050mV  1150mV 
    s1b                           0    0      0  1050mV     0mA  1050mV  1150mV 
    s2a                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu0                                                      1150mV  1207mV
    s2b                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu1                                                      1150mV  1207mV
 SDCC Power                       0    0      0  3300mV     0mA  3300mV  3300mV
regulator_summary (after)
cat /sys/kernel/debug/regulator/regulator_summary
 regulator                      use open bypass voltage current     min     max
-------------------------------------------------------------------------------
 regulator-dummy                  0   10      0     0mV     0mA     0mV     0mV 
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    s1a                           0    2      0  1150mV     0mA  1050mV  1150mV 
       cpu1                                                      1150mV  1150mV
       cpu0                                                      1150mV  1150mV
    s1b                           0    0      0  1050mV     0mA  1050mV  1150mV 
    s2a                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu0                                                      1150mV  1207mV
    s2b                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu1                                                      1150mV  1207mV
 SDCC Power                       0    0      0  3300mV     0mA  3300mV  3300mV
2 Likes

If krait_l2_pri_mux is the L2 cache then what is krait_l2_sec_mux ?

Has anybody else noticed recently stability problem with master?
Possibly only with the ath10k driver (not -ct)...

I did yesterday a build and the router crashed quickly if wifi was active. I reverted quickly to a week-old -ct build, which is stable. (r10506-cbae306)

I did not have time to do further test builds, but I wonder if there have been generic changes (like the mac80211 version bump 4 days ago) that might play havoc with regular ath10k. There aren't that many relevant looking commits since the week-old state.