Netgear R7800 exploration (IPQ8065, QCA9984)

I believe it's not the cause of your issue, I believe those clocks are running at nominal rate of 400mhz when are not scaled by any driver and that should be enough for normal operation.
Have you tried limiting krait minimal freq to 800 MHz, so at least l2 cache wouldn't fail?
Anyway I think it's more like a bug that triggers your hangs that maybe caused by race condition that is not being taken care of?

Yes, I did that. It didn't help. The router still randomly reboots when the NSS core clocks are scaled. The only way to make the router stable is to force the scaling to stop, i.e. make the NSS core run at a fixed core frequency. My R7800 has been up more than a week without any apparent issue :slight_smile: with the NSS cores accelerating routing tasks.

1 Like

Well, it worked this time: https://gist.github.com/fantom-x/379d158f5395dacf78e07955f593192c. I only had to change three lines in @dissent1'1 patch, but I remember that I had way more issues trying this before; very weird.

There are four patches in there that need to be applied in order to 19.07. When the build applies the changes, it is not 100% clean but it works. The router booted fine just a few minutes ago.

krait_l2_pri_mux/clk_rate == 1200000000 did not change, but maybe because the router was already running with the performance governor. regulator_summary values did change as per Netgear R7800 exploration (IPQ8065, QCA9984).

regulator_summary (before)
cat /sys/kernel/debug/regulator/regulator_summary
 regulator                      use open bypass voltage current     min     max
-------------------------------------------------------------------------------
 regulator-dummy                  0   10      0     0mV     0mA     0mV     0mV 
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    s1a                           0    0      0  1050mV     0mA  1050mV  1150mV 
    s1b                           0    0      0  1050mV     0mA  1050mV  1150mV 
    s2a                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu0                                                      1150mV  1207mV
    s2b                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu1                                                      1150mV  1207mV
 SDCC Power                       0    0      0  3300mV     0mA  3300mV  3300mV
regulator_summary (after)
cat /sys/kernel/debug/regulator/regulator_summary
 regulator                      use open bypass voltage current     min     max
-------------------------------------------------------------------------------
 regulator-dummy                  0   10      0     0mV     0mA     0mV     0mV 
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b700000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    1b500000.pci                                                    0mV     0mV
    s1a                           0    2      0  1150mV     0mA  1050mV  1150mV 
       cpu1                                                      1150mV  1150mV
       cpu0                                                      1150mV  1150mV
    s1b                           0    0      0  1050mV     0mA  1050mV  1150mV 
    s2a                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu0                                                      1150mV  1207mV
    s2b                           0    1      0  1150mV     0mA   775mV  1275mV 
       cpu1                                                      1150mV  1207mV
 SDCC Power                       0    0      0  3300mV     0mA  3300mV  3300mV
2 Likes

If krait_l2_pri_mux is the L2 cache then what is krait_l2_sec_mux ?

Has anybody else noticed recently stability problem with master?
Possibly only with the ath10k driver (not -ct)...

I did yesterday a build and the router crashed quickly if wifi was active. I reverted quickly to a week-old -ct build, which is stable. (r10506-cbae306)

I did not have time to do further test builds, but I wonder if there have been generic changes (like the mac80211 version bump 4 days ago) that might play havoc with regular ath10k. There aren't that many relevant looking commits since the week-old state.

I've seen a similar behaviour with the mac80211 bump and ath10k (firmware ver 10.4-3.10-00047) on my nbg6817, but I was testing kernel 4.19 with USB patches and several other relatively experimental changes:

daemon.info hostapd: wlan1: STA 84:38:38:xx:xx:xx IEEE 802.11: authenticated
daemon.info hostapd: wlan1: STA 84:38:38:xx:xx:xx IEEE 802.11: associated (aid 1)
kern.alert kernel: [  692.349965] Unable to handle kernel paging request at virtual address fffff9e8
kern.alert kernel: [  692.349997] pgd = 6552583d
kern.alert kernel: [  692.356160] [fffff9e8] *pgd=5fffd861, *pte=00000000, *ppte=00000000
kern.emerg kernel: [  692.358863] Internal error: Oops: 37 [#1] SMP ARM
kern.warn kernel: [  692.364928] Modules linked in: pppoe ppp_async iptable_nat ipt_MASQUERADE ath10k_pci ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_FLOWOFFLOAD xt_CT pppox ppp_generic nf_nat_ipv4 nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack_netlink nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_policy xt_multiport xt_mark xt_mac xt_limit xt_esp xt_comment xt_TCPMSS xt_LOG usblp slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_filter ipt_ah ip_tables crc_ccitt compat configs xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port
kern.warn kernel: [  692.417868]  ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 sit xfrm6_mode_tunnel xfrm6_mode_transport xfrm6_mode_beet ipcomp6 xfrm6_tunnel esp6 ah6 xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet ipcomp esp4 ah4 tunnel6 tunnel4 ip_tunnel xfrm_user xfrm_ipcomp af_key xfrm_algo vfat fat autofs4 nls_utf8 nls_iso8859_1 nls_cp437 sha1_generic md5 echainiv authenc uas usb_storage leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom ohci_platform ohci_hcd ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug f2fs ext4 mbcache jbd2 crc32c_generic crc32_generic
kern.warn kernel: [  692.483498] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.59 #0
kern.warn kernel: [  692.505727] Hardware name: Generic DT based system
kern.warn kernel: [  692.512079] PC is at ieee80211_sta_register_airtime+0x24/0x148 [mac80211]
kern.warn kernel: [  692.516646] LR is at ath10k_htt_t2h_msg_handler+0x668/0x1114 [ath10k_core]
kern.warn kernel: [  692.523437] pc : [<bf58804c>]    lr : [<bf6d3844>]    psr: a0000113
kern.warn kernel: [  692.530207] sp : c0b01d44  ip : 00000002  fp : bf6fed54
kern.warn kernel: [  692.536369] r10: 00000034  r9 : d876a180  r8 : 00060002
kern.warn kernel: [  692.541577] r7 : 00000000  r6 : d737d650  r5 : 00000000  r4 : d8769600
kern.warn kernel: [  692.546788] r3 : 00000000  r2 : 00060002  r1 : 00000000  r0 : 00000000
kern.warn kernel: [  692.553387] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
kern.warn kernel: [  692.559896] Control: 10c5787d  Table: 579bc06a  DAC: 00000051
kern.emerg kernel: [  692.567100] Process swapper/0 (pid: 0, stack limit = 0xa4e93fc3)
kern.emerg kernel: [  692.572831] Stack: (0xc0b01d44 to 0xc0b02000)
kern.emerg kernel: [  692.578916] 1d40:          d8769600 00000000 d737d650 00000001 d737d650 d876a180 00000034
kern.emerg kernel: [  692.583176] 1d60: bf6fed54 bf6d3844 00000002 c0314b50 d8769cb0 00000001 00000001 d737d64c
kern.emerg kernel: [  692.591335] 1d80: 00010000 00000000 d876d7fc bf724430 c0310d84 d9d48300 d8769600 00000000
kern.emerg kernel: [  692.599496] 1da0: c0b01dc0 c08454c4 d876d7fc 00000001 00000022 bf720368 d9d48300 00000022

In other words, the first connecting wireless client resulted in that Oops and triggered a reboot.

The same (experimental) patch set has been working with the previous v4.19 based mac80211 backports for the last 4 days.

@hauke
Any idea why the new mac80211 version might trigger oops with the old regular ath10k (not -ct) ?

I think the following belongs in this thread, if not I can split it elsewhere. This may be of interest to some users.

Enabling VHT80P80 on the R7800.
Apply the following patch (note that this patch is handwritten from my own notes, so may not apply cleanly. i'm still formalising a patch to incorporate into my build environment):

--- a/lib/netifd/wireless/mac80211.sh
+++ b/lib/netifd/wireless/mac80211.sh
@@ -49,6 +49,7 @@ drv_mac80211_init_device_config() {
 		short_gi_40 \
 		max_amsdu \
 		dsss_cck_40
+	config_add_string channel2
 }
 
 drv_mac80211_init_iface_config() {

@@ -96,7 +97,7 @@ mac80211_hostapd_setup_base() {
 	[ "$auto_channel" -gt 0 ] && channel=acs_survey
 	[ "$auto_channel" -gt 0 ] && json_get_values channel_list channels
 
-	json_get_vars noscan ht_coex
+	json_get_vars noscan ht_coex channel2
 	json_get_values ht_capab_list ht_capab
 	ieee80211n=1
 	ht_capab=
 	case "$htmode" in
 		VHT20|HT20) ;;
-		HT40*|VHT40|VHT80|VHT160)
+		HT40*|VHT40|VHT80|VHT160|VHT80P80)
 			case "$hwmode" in
 				a)
 					case "$(( ($channel / 4) % 2 ))" in

@@ -174,6 +175,7 @@
 	# 802.11ac
 	enable_ac=0
 	idx="$channel"
+	idx2="$channel2"
 	case "$htmode" in
 		VHT20) enable_ac=1;;
 		VHT40)

@@ -204,6 +206,24 @@
 			append base_cfg "vht_oper_chwidth=2" "$N"
 			append base_cfg "vht_oper_centr_freq_seg0_idx=$idx" "$N"
 		;;
+		VHT80P80)
+			case "$(( ($channel / 4) % 4 ))" in
+				1) idx=$(($channel + 6));;
+				2) idx=$(($channel + 2));;
+				3) idx=$(($channel - 2));;
+				0) idx=$(($channel - 6));;
+			esac
+			case "$(( ($channel2 / 4) % 4 ))" in
+				1) idx2=$(($channel2 + 6));;
+				2) idx2=$(($channel2 + 2));;
+				3) idx2=$(($channel2 - 2));;
+				0) idx2=$(($channel2 - 6));;
+			esac
+			enable_ac=1
+			append base_cfg "vht_oper_chwidth=3" "$N"
+			append base_cfg "vht_oper_centr_freq_seg0_idx=$idx" "$N"
+			append base_cfg "vht_oper_centr_freq_seg1_idx=$idx2" "$N"
+		;;
 	esac
 
 	if [ "$enable_ac" != "0" ]; then

Apply the following (or similar) settings in /etc/config/wireless/:

config wifi-device 'radio0'
	option type 'mac80211'
	option hwmode '11a'
	option path 'soc/1b500000.pci/pci0000:00/0000:00:00.0/0000:01:00.0'
	option channel '36'
	option channel2 '149'
	option htmode 'VHT80P80'

Fully restart netifd so it picks up the new config items /etc/init.d/network restart, and it should fire up.
Not many channel scanners understand VHT80P80 so i had to rely on my own.

This patch would need to be cleaned up and thought through a bit more carefully to be included upstream, but will become more useful when we have HE80P80 capability in future routers (probably ath11k).

I've only tried this with kmod-ath10k-ct and the associated ct firmware so far.

3 Likes

What is 80p80

160MHz bandwidth on two noncontiguous 80MHz channels.
80P80 means 80+80

80 Padding 80

So this will permit 160 MHz on more channel?

Does anyone know how to completely remove the wifi blinking LED code from r7800 build?

ath10k-4.19/mac.c +/blink
include/mac80211/led.h

Do I just delete those two files?

Does this work with DFS? Otherwise it will not be legal to use 80+80 channels in the EU, as you have to use spectrum above channel 48 (DFS only) to create the second 80 MHz channel.

whooooa! hold your horses cowboy :wink:

i don't know much but.......

remove this near the bottom of mac.c ( build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x/ath10k-ct-XYZ-ish )

#ifdef CPTCFG_MAC80211_LEDS
        ar->led_default_trigger = ieee80211_create_tpt_led_trigger(ar->hw,
                IEEE80211_TPT_LEDTRIG_FL_RADIO, ath10k_tpt_blink,
                ARRAY_SIZE(ath10k_tpt_blink));
#endif

or find where to set CPTCFG_MAC80211_LEDS 0

just a guess... tho...

or set .blink_time to 0 in mac.c;

static const struct ieee80211_tpt_blink ath10k_tpt_blink[] = {
	{ .throughput = 0 * 1024, .blink_time = 0 },
	{ .throughput = 1 * 1024, .blink_time = 0 },
	{ .throughput = 2 * 1024, .blink_time = 0 },
	{ .throughput = 5 * 1024, .blink_time = 0 },
	{ .throughput = 10 * 1024, .blink_time = 0 },
	{ .throughput = 25 * 1024, .blink_time = 0 },
	{ .throughput = 54 * 1024, .blink_time = 0 },
	{ .throughput = 120 * 1024, .blink_time = 0 },
	{ .throughput = 265 * 1024, .blink_time = 0 },
	{ .throughput = 586 * 1024, .blink_time = 0 },
};

or just disable led support from config menu o.o

2 Likes

I have collapsed @dissent1's PR into a single patch file and have a couple of questions.

  1. Multiple CPU frequencies can map to the same l2 frequency, so it makes sense to cache it and avoid calling into clk_set_rate, which is using locking. The same applies to l2 voltage. What is the right way to cache those variables in this driver: do I need to use per_cpu approach?

  2. There was a suggestion above in the thread to set min frequency to 800MHz: is that done in the dts entries like this? I could then remove the last two rows from each entry in qcom-ipq8065.dtsi.

                qcom,speed0-pvs0-bin-v0 =
                        < 1725000000 1262500 >,
                        < 1400000000 1175000 >,
                        < 1000000000 1100000 >,
                         < 800000000 1050000 >,
                         < 600000000 1000000 >,
                         < 384000000 975000 >;
  1. The original PR was not merged with the explanation below. Can anyone provide some hints about what is expected? I read it as I just need to copy the current cpufreq-dt.h/c into cpufreq-dt-ipq806x.h/c an make it a part of the patch. Is that correct?

I noticed that you're hacking a lot of ipq806x specific code into the generic cpufreq-dt driver. I think it would be a lot less messy if you just fork that driver and create an ipq806x specific one instead.

2 Likes

About point 3
You need to create a separate driver and set a dedicated compatible in dts

Can you point me to a sample drive that is done this way?