Qualcommax NSS Build

It's still a very new addition to upstream, so hasn't been on my radar as much. But I can it to the PR as well. Hopefully it too can be merged upstream.

See my comment here:

The reporting of NSS CPU load is very basic and nothing like what's provided in the kernel stack for SoC CPU. The NSS driver just reads from current, average, max registers set by the firmware. The firmware decides when, and how often to update those metrics.

2 Likes

Does "firmware" mean something like nss-firmware-ipq6018, nss-firmware-ipq8074 or qca-nss-drv` :thinking:? It sounds like we need to modify the related params in firmware if we want realtime update.

My device is Redmi AX6

root@OpenWrt:~# mkdir -p /tmp/ramoops
root@OpenWrt:~# mount -t pstore pstore /tmp/ramoops
root@OpenWrt:~# ls -ltr /tmp/ramoops
-r--r--r--    1 root     root          9035 Jun 11 23:20 dmesg-ramoops-0
root@OpenWrt:~#

After router reboot, I got those output in /tmp/ramoops/dmesg-ramoops-0 :

Panic#1 Part1
<6>[   32.220507] device lan2 entered promiscuous mode
<6>[   32.224219] l11: disabling
<6>[   32.235184] br-lan: port 3(lan3) entered blocking state
<6>[   32.235248] br-lan: port 3(lan3) entered disabled state
<6>[   32.240325] device lan3 entered promiscuous mode
<6>[   32.920210] configuring additional NSS pbufs
<6>[   32.931516] additional pbufs of size 3100672 got added to NSS
<6>[   33.149615] br-lan: port 4(phy1-ap0) entered blocking state
<6>[   33.149677] br-lan: port 4(phy1-ap0) entered disabled state
<6>[   33.156232] device phy1-ap0 entered promiscuous mode
<6>[   33.161169] br-lan: port 4(phy1-ap0) entered blocking state
<6>[   33.164828] br-lan: port 4(phy1-ap0) entered forwarding state
<6>[   33.172818] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
<6>[   33.179451] br-lan: port 4(phy1-ap0) entered disabled state
<6>[   33.336631] IPv6: ADDRCONF(NETDEV_CHANGE): phy1-ap0: link becomes ready
<6>[   33.336864] br-lan: port 4(phy1-ap0) entered blocking state
<6>[   33.342108] br-lan: port 4(phy1-ap0) entered forwarding state
<6>[   33.764830] br-lan: port 5(phy0-ap0) entered blocking state
<6>[   33.764878] br-lan: port 5(phy0-ap0) entered disabled state
<6>[   33.769759] device phy0-ap0 entered promiscuous mode
<6>[   33.775056] br-lan: port 5(phy0-ap0) entered blocking state
<6>[   33.780040] br-lan: port 5(phy0-ap0) entered forwarding state
<6>[   34.216219] br-lan: port 5(phy0-ap0) entered disabled state
<6>[   34.530557] IPv6: ADDRCONF(NETDEV_CHANGE): phy0-ap0: link becomes ready
<6>[   34.530805] br-lan: port 5(phy0-ap0) entered blocking state
<6>[   34.536062] br-lan: port 5(phy0-ap0) entered forwarding state
<6>[   36.384293] nss-dp 3a001200.dp2 wan: PHY Link up speed: 1000
<6>[   36.384374] IPv6: ADDRCONF(NETDEV_CHANGE): wan: link becomes ready
<6>[   37.344400] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 1000
<6>[   37.344519] br-lan: port 2(lan2) entered blocking state
<6>[   37.349179] br-lan: port 2(lan2) entered forwarding state
<6>[  153.199439] sh (13803): drop_caches: 3
<6>[  184.386993] device phy0-ap0 left promiscuous mode
<6>[  184.387197] br-lan: port 5(phy0-ap0) entered disabled state
<6>[  184.396927] device phy1-ap0 left promiscuous mode
<6>[  184.397197] br-lan: port 4(phy1-ap0) entered disabled state
<6>[  186.490510] br-lan: port 4(phy1-ap0) entered blocking state
<6>[  186.490550] br-lan: port 4(phy1-ap0) entered disabled state
<6>[  186.495306] device phy1-ap0 entered promiscuous mode
<6>[  186.500587] br-lan: port 4(phy1-ap0) entered blocking state
<6>[  186.505668] br-lan: port 4(phy1-ap0) entered forwarding state
<6>[  186.607159] br-lan: port 5(phy0-ap0) entered blocking state
<6>[  186.607204] br-lan: port 5(phy0-ap0) entered disabled state
<6>[  186.612163] device phy0-ap0 entered promiscuous mode
<6>[  186.617461] br-lan: port 5(phy0-ap0) entered blocking state
<6>[  186.622346] br-lan: port 5(phy0-ap0) entered forwarding state
<6>[  187.164236] br-lan: port 4(phy1-ap0) entered disabled state
<6>[  187.164601] br-lan: port 5(phy0-ap0) entered disabled state
<6>[  187.364409] IPv6: ADDRCONF(NETDEV_CHANGE): phy0-ap0: link becomes ready
<6>[  187.364719] br-lan: port 5(phy0-ap0) entered blocking state
<6>[  187.369906] br-lan: port 5(phy0-ap0) entered forwarding state
<6>[  188.871091] IPv6: ADDRCONF(NETDEV_CHANGE): phy1-ap0: link becomes ready
<6>[  188.871359] br-lan: port 4(phy1-ap0) entered blocking state
<6>[  188.876546] br-lan: port 4(phy1-ap0) entered forwarding state
<6>[  200.922032] bash (15111): drop_caches: 3
<6>[  501.507760] bash (18115): drop_caches: 3
<6>[ 3801.547653] bash (25842): drop_caches: 3
<6>[ 4101.543599] bash (26500): drop_caches: 3
<6>[ 5233.115235] nss-dp 3a001600.dp4 lan2: PHY Link is down
<6>[ 5233.115688] br-lan: port 2(lan2) entered disabled state
<6>[ 7401.554064] bash (1312): drop_caches: 3
<6>[ 7701.559635] bash (1981): drop_caches: 3
<6>[11001.571315] bash (9280): drop_caches: 3
<6>[11301.558824] bash (9940): drop_caches: 3
<6>[14601.582629] bash (17252): drop_caches: 3
<6>[14901.553857] bash (20665): drop_caches: 3
<6>[18201.087959] bash (28402): drop_caches: 3
<6>[18501.133998] bash (29070): drop_caches: 3
<6>[21801.145288] bash (3897): drop_caches: 3
<6>[22101.092031] bash (4584): drop_caches: 3
<6>[25401.172370] bash (11849): drop_caches: 3
<6>[25700.917097] bash (12996): drop_caches: 3
<6>[29000.983371] bash (20282): drop_caches: 3
<6>[29300.984379] bash (20942): drop_caches: 3
<6>[32601.005079] bash (28201): drop_caches: 3
<6>[32900.999501] bash (28876): drop_caches: 3
<6>[35997.089130] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 10
<6>[35997.089240] br-lan: port 2(lan2) entered blocking state
<6>[35997.093543] br-lan: port 2(lan2) entered forwarding state
<6>[36003.233061] nss-dp 3a001600.dp4 lan2: PHY Link is down
<6>[36003.233479] br-lan: port 2(lan2) entered disabled state
<6>[36006.305114] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 1000
<6>[36006.305234] br-lan: port 2(lan2) entered blocking state
<6>[36006.309878] br-lan: port 2(lan2) entered forwarding state
<6>[36012.449037] nss-dp 3a001600.dp4 lan2: PHY Link is down
<6>[36012.449448] br-lan: port 2(lan2) entered disabled state
<6>[36015.521102] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 1000
<6>[36015.521235] br-lan: port 2(lan2) entered blocking state
<6>[36015.525864] br-lan: port 2(lan2) entered forwarding state
<6>[36201.002550] bash (3740): drop_caches: 3
<6>[36415.904223] nss-dp 3a001600.dp4 lan2: PHY Link is down
<6>[36415.904638] br-lan: port 2(lan2) entered disabled state
<6>[36501.024080] bash (4409): drop_caches: 3
<6>[39801.027126] bash (11686): drop_caches: 3
<6>[40101.032678] bash (12341): drop_caches: 3
<6>[43401.046213] bash (19636): drop_caches: 3
<6>[43701.058119] bash (20319): drop_caches: 3
<6>[47001.055473] bash (27580): drop_caches: 3
<6>[47301.068544] bash (28239): drop_caches: 3
<6>[50601.074027] bash (3052): drop_caches: 3
<6>[50901.079013] bash (3735): drop_caches: 3
<6>[54201.033160] bash (11031): drop_caches: 3
<6>[54501.099687] bash (11698): drop_caches: 3
<6>[57801.123786] bash (18979): drop_caches: 3
<6>[58101.110557] bash (19649): drop_caches: 3
<6>[61401.123376] bash (26919): drop_caches: 3
<6>[61701.136861] bash (27584): drop_caches: 3
<6>[65001.150371] bash (2405): drop_caches: 3
<6>[65301.140890] bash (3068): drop_caches: 3
<6>[68601.155332] bash (10358): drop_caches: 3
<6>[68901.159704] bash (11021): drop_caches: 3
<6>[72201.167059] bash (18299): drop_caches: 3
<6>[72501.170718] bash (18970): drop_caches: 3
<6>[75801.179956] bash (26244): drop_caches: 3
<6>[76101.192804] bash (26902): drop_caches: 3
<6>[79401.192424] bash (1739): drop_caches: 3
<6>[79701.196696] bash (2429): drop_caches: 3
<6>[83001.204820] bash (10053): drop_caches: 3
<6>[83301.205459] bash (10751): drop_caches: 3
<6>[85066.053041] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 10
<6>[85066.053151] br-lan: port 2(lan2) entered blocking state
<6>[85066.057453] br-lan: port 2(lan2) entered forwarding state
<6>[85072.196990] nss-dp 3a001600.dp4 lan2: PHY Link is down
<6>[85072.197433] br-lan: port 2(lan2) entered disabled state
<6>[85075.269051] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 1000
<6>[85075.269162] br-lan: port 2(lan2) entered blocking state
<6>[85075.273815] br-lan: port 2(lan2) entered forwarding state
<6>[85082.437002] nss-dp 3a001600.dp4 lan2: PHY Link is down
<6>[85082.437314] br-lan: port 2(lan2) entered disabled state
<6>[85084.485065] nss-dp 3a001600.dp4 lan2: PHY Link up speed: 1000
<6>[85084.485226] br-lan: port 2(lan2) entered blocking state
<6>[85084.489828] br-lan: port 2(lan2) entered forwarding state
<6>[86601.232735] bash (18237): drop_caches: 3
<6>[86901.233294] bash (18895): drop_caches: 3
<6>[90201.203943] bash (26371): drop_caches: 3
<6>[90501.256956] bash (27048): drop_caches: 3
<6>[91578.932903] sysrq: Trigger a crash
<0>[91578.932942] Kernel panic - not syncing: sysrq triggered crash
<4>[91578.935207] CPU: 0 PID: 29456 Comm: sh Not tainted 6.1.92 #0
<4>[91578.941018] Hardware name: Redmi AX6 (DT)
<4>[91578.946747] Call trace:
<4>[91578.950648]  dump_backtrace.part.0+0xbc/0xd0
<4>[91578.952909]  show_stack+0x18/0x30
<4>[91578.957419]  dump_stack_lvl+0x6c/0x88
<4>[91578.960633]  dump_stack+0x18/0x34
<4>[91578.964278]  panic+0x158/0x304
<4>[91578.967575]  sysrq_handle_crash+0x1c/0x20
<4>[91578.970529]  __handle_sysrq+0x8c/0x1a0
<4>[91578.974609]  write_sysrq_trigger+0xc0/0x120
<4>[91578.978255]  proc_reg_write+0xb0/0x100
<4>[91578.982333]  vfs_write+0xa4/0x280
<4>[91578.986153]  ksys_write+0x5c/0xe0
<4>[91578.989539]  __arm64_sys_write+0x1c/0x30
<4>[91578.992838]  invoke_syscall.constprop.0+0x5c/0x110
<4>[91578.996833]  do_el0_svc+0x58/0x170
<4>[91579.001431]  el0_svc+0x18/0x54
<4>[91579.004815]  el0t_64_sync_handler+0x114/0x120
<4>[91579.007856]  el0t_64_sync+0x174/0x178
<2>[91579.012284] SMP: stopping secondary CPUs
<0>[91579.015935] Kernel Offset: disabled
<0>[91579.019920] CPU features: 0x00000,00000000,0000400b
<0>[91579.023136] Memory Limit: none
1 Like

nss-firmware-* = Firmware. The binary blobs (/lib/firmware/qca-nss*) that are loaded onto the NSS cores on boot.

qca-nss-drv = Driver. The thing talking to the firmware.

There's no "params" to modify. It's a read-only operation. The moment you cat /sys/kernel/debug/qca-nss-drv/stats/cpu_load_ubi, is when the driver asks the firmware for stats.

This is the entire code for it.

/*
 * nss_freq_stats.c
 *	NSS Frequency statistics APIs.
 */

#include "nss_stats.h"
#include "nss_tx_rx_common.h"

/*
 * At any point, this object has the latest data about CPU utilization.
 */
extern struct nss_freq_cpu_usage nss_freq_cpu_status;

/*
 * Spinlock to protect the global data structure nss_freq_cpu_status
 */
extern spinlock_t nss_freq_cpu_usage_lock;

/*
 * nss_freq_stats_read()
 * 	Read frequency stats and display CPU information.
 */
static ssize_t nss_freq_stats_read(struct file *fp, char __user *ubuf, size_t sz, loff_t *ppos)
{
	/*
	 * max output lines = Should change in case of number of lines below.
	 */
	uint32_t max_output_lines = (2 + 3) + 5;
	size_t size_al = NSS_STATS_MAX_STR_LENGTH * max_output_lines;
	size_t size_wr = 0;
	ssize_t bytes_read = 0;
	uint32_t avg, max, min;

	char *lbuf = kzalloc(size_al, GFP_KERNEL);
	if (unlikely(!lbuf)) {
		nss_warning("Could not allocate memory for local statistics buffer");
		return 0;
	}

	size_wr = scnprintf(lbuf, size_al, "CPU Utilization:\n");

	spin_lock_bh(&nss_freq_cpu_usage_lock);
	avg = nss_freq_cpu_status.used;
	max = nss_freq_cpu_status.max;
	min = nss_freq_cpu_status.min;
	spin_unlock_bh(&nss_freq_cpu_usage_lock);

	size_wr += scnprintf(lbuf + size_wr, size_al - size_wr, "Note: Averaged over 1 second\n\n");
	size_wr += scnprintf(lbuf + size_wr, size_al - size_wr, "Core 0:\n");
	size_wr += scnprintf(lbuf + size_wr, size_al - size_wr, "Min\tAvg\tMax\n");
	size_wr += scnprintf(lbuf + size_wr, size_al - size_wr, " %u%%\t %u%%\t %u%%\n\n", min, avg, max);

	bytes_read = simple_read_from_buffer(ubuf, sz, ppos, lbuf, strlen(lbuf));
	kfree(lbuf);

	return bytes_read;
}

/*
 * nss_freq_stats_ops
 */
NSS_STATS_DECLARE_FILE_OPERATIONS(freq)

/*
 * nss_freq_dentry_create()
 */
void nss_freq_stats_dentry_create(void)
{
	nss_stats_create_dentry("cpu_load_ubi", &nss_freq_stats_ops);
}

As I said, it's super basic.

2 Likes

Wow! :blush:Thanks for the very impressive explanation. That means we cannot do nothing about it but just wait?

Even when it does update, would you trust it? It's kind of a useless metric to look at if you're not getting close to real time updates.

1 Like

Btw, there is no actual need to do that.
The pstore filesystem and files should be available in /sys/fs/pstore by default

1 Like

Incorporated everything into my .config. Thanks!

1 Like

can confirm that the pstore/ramoops addition works in the ax9000

1 Like

Hi, Did anyone encounter ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event? I met this issues 3 hours after booting.

[13327.492541] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13328.945791] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13329.396565] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13330.993670] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13334.987362] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13335.028107] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13335.888292] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13336.113551] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13336.215940] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13336.932866] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13337.035125] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13338.961312] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13339.083086] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13341.315391] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[13342.140297] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18479.205571] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18479.225922] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18479.328035] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18479.431040] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18488.646961] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18490.796648] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18491.124323] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18492.127831] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18494.790292] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18496.614438] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.247765] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.268832] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.350512] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.370658] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.452714] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.473091] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.759754] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18497.780339] ath11k c000000.wifi: invalid pdev_id 2 in mgmt_rx_event
[18499.872650] ath11k_warn: 6 callbacks suppressed

Is the ramoops of my device AX6 (512) invalid?

@qosmio Hi, is there some simple/easy way how to build that NSS repo of yours just without wifi offloading? Or maybe a way how to turn it off? Would like to know for sure if those dynamic vlans are broken bc of that or no...

Thanks.

Not following, invalid how? Are you asking if the memory region specified in the DTS is valid or if ramoops is working?r

You can build without NSS wifi by disabling config options

CONFIG_ATH11K_NSS_SUPPORT=n
CONFIG_PACKAGE_MAC80211_NSS_SUPPORT=n

There is also the option to load ath11k without nss_offload, but it's currently broken as it causes a kernel panic shortly after loading.

# Enabled
➤ cat /etc/modules.d/ath11k
ath11k nss_offload=1 frame_mode=2

# To disable
➤ echo "ath11k nss_offload=0 frame_mode=2" > /etc/modules.d/ath11k

# Verify after rebooting
➤ cat /etc/modules.d/ath11k
ath11k nss_offload=0 frame_mode=2

➤ cat /sys/module/ath11k/parameters/nss_offload
0
2 Likes

For anyone curious about cpu_load_ubi, did some experimenting today. It turns out the reason people see min,max,avg being a mostly static value is due NSS frequency scaling.

➤ cat /proc/sys/dev/nss/clock/auto_scale
0

When set to 0, the NSS core is fixed to it's highest setting (1.7ghz). As a result, the cores don't CPU load usage stats.

BTW, I please DO NOT enable it. I had it disabled for performance reasons. Even Qualcomm disables frequency scaling in their mac80211 package. Simply posting my findings for those curious.

6 Likes

So yeah when I switched off those two configuration toggles and then moved both 207 patches from ath11k and subsys under nss folder into their default subfolders respectively the dynamic vlans works.

So it has something to do with rest of the patches thus offloading.

Too bad we can't just toggle the nss_offload on flashed device, that would probably said a lot more.

@ zxlhhyccc was asking if the output he obtained from his Redmi AX6 is valid/invalid -- he provide steps and logs which I personally think it's valid/working. (his English is quite poor, I'd say...)

if that looks the same to you, then the "list of working Model" can have "Redmi AX6" added with a "Y" :slight_smile:

1 Like

His pstore log looks quite ok (normal log, then sysrq triggered crash) in

1 Like

Thank you. @qosmio If it looks the same, please add "Redmi AX6" to the "list of working Model".

1 Like

Yup, looks good. The fact you're getting files generated in pstore partition is what I was looking for.

I've updated the PR to add that model. Thanks.

So now I found out that I have problems with VLAN filtering in general, does anyone else here also uses nss build with utilization of VLAN filtering on the bridge?

Setup:

  • Device: AX3600
  • Using VLANs on the wan (wan.# - wana: main table, wan.# - wanb: default table).
  • Enabled VLAN filtering with multiple vlans on the br-lan where is one trunk port and two access ports.
  • Firewall rule which marks packets with destination IP1.
  • Route rule which points marked packets to the default table.
  • PC1 connected to the router trough the trunk port.
  • PC2 connected to the router trough the access port.
  • Both are members of the same vlan.

Observed behavior:

  • :white_check_mark: When PC1 initiate connection to the IP1 packets are successfully routed trough the default routing table.
  • :x: When PC2 initiate connection to the IP1 packets are routed trough the main routing table.

My best guess is maybe some race condition between vlan tagging, firewall marking/routing? Given the fact that it works when router does not have to tag the packets?