Qualcommax NSS Build

@qosmio no issues with auto_scale ... running it for a day. fyi actually like autoscale as I run the on-demand governor . Still getting 1.8Gps ... having the nss napi patch reintroduced also helped a lot

1 Like

Latest Commit without NAPI:

Connecting to host 192.168.1.1, port 5201
[  5] local 192.168.1.65 port 38598 connected to 192.168.1.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   145 MBytes  1.21 Gbits/sec
[  5]   1.00-2.00   sec   154 MBytes  1.30 Gbits/sec
[  5]   2.00-3.00   sec   160 MBytes  1.34 Gbits/sec
[  5]   3.00-4.01   sec   160 MBytes  1.33 Gbits/sec
[  5]   4.01-5.00   sec   161 MBytes  1.37 Gbits/sec
[  5]   5.00-6.00   sec   151 MBytes  1.27 Gbits/sec
[  5]   6.00-7.01   sec   149 MBytes  1.23 Gbits/sec
[  5]   7.01-8.01   sec   154 MBytes  1.29 Gbits/sec
[  5]   8.01-9.01   sec   152 MBytes  1.28 Gbits/sec
[  5]   9.01-10.01  sec   152 MBytes  1.28 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  1.50 GBytes  1.29 Gbits/sec                  sender
[  5]   0.00-10.01  sec  1.50 GBytes  1.29 Gbits/sec                  receiver

With NAPI:

Connecting to host 192.168.1.1, port 5201
[  5] local 192.168.1.65 port 38352 connected to 192.168.1.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.01   sec   136 MBytes  1.13 Gbits/sec
[  5]   1.01-2.01   sec   133 MBytes  1.12 Gbits/sec
[  5]   2.01-3.01   sec   136 MBytes  1.14 Gbits/sec
[  5]   3.01-4.01   sec   136 MBytes  1.14 Gbits/sec
[  5]   4.01-5.00   sec   130 MBytes  1.10 Gbits/sec
[  5]   5.00-6.01   sec   143 MBytes  1.20 Gbits/sec
[  5]   6.01-7.00   sec   143 MBytes  1.20 Gbits/sec
[  5]   7.00-8.01   sec   146 MBytes  1.21 Gbits/sec
[  5]   8.01-9.01   sec   138 MBytes  1.17 Gbits/sec
[  5]   9.01-10.01  sec   139 MBytes  1.17 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec  1.35 GBytes  1.16 Gbits/sec                  sender
[  5]   0.00-10.01  sec  1.35 GBytes  1.16 Gbits/sec                  receiver

Its safe to say the ax3600 is in an excellent state rn. It has a 1gig WAN port and achieves over 1gig Wifi.

1 Like

I will run NAPI branch without auto_scaling first to see if i am getting any issues for a day or two

1 Like

I test the build with NSS autoscale ON now and NAPI but curious how do I know that NAPI is working on my running build. Is there a command for NAPI similar to the one for NSS Stats. Additionally memory profile is now set to 1GB. Previously I've used 512MB as it was recommended a while ago.

2 Likes

@qosmio , Hi there, I notice a big update branch named "NAPI" and came back to test it on IPQ6018. Compilation was successful, I got the following WiFi speedtest result (iperf3 913.57 Mb DL and 836.87 UL speed, with almost 0% CPU usage :nerd_face:):

.
On behalf of IPQ6018 Chinese community, I collected some questions from folks and really appreciate your help :hugs::

  1. What is the merit of "NAPI"? I got almost the same result (WiFi speed and CPU usage) with NAPI and without NAPI. Do we really need it?
  2. As you can see, there is still almost 100Mb speed gap between WiFi DL and UL speed.If I want to imporve it, which one shoud I focus on? WiFi firmware, NSS firmware or both of them?
  3. I noticed irqbalance has something to do with NSS. I enabled it with luci-app-irqbalance. I found I got 20~30Mb faster in UL speed. So it is indeed exist or just a random incident?

Again, thanks for your attention! :hugs:

you will see threads prefixed as napi/nss-

2 Likes

the iperf3 results are from wireless client to ax3600 as the server?

If you have htop installed you should now see context NSS-X for each thread

1 Like

I don't see any such records in htop. Logread is empty too.

root@QNAP:~# logread | grep napi
root@QNAP:~#

I cannot find these. Maybe a recompilation issue although I have the patches for NAPI (0018-nss-drv-add-napi-threading.patch) in builddir.

napi in threaded mode seems to be working good now. logs are clean. no issues.

1 Like

Yes. Ax3600 as server

[   46.565748] BUG: scheduling while atomic: napi/nss-9/993/0x00000201
[   46.565810] Modules linked in: ecm(O) nft_fib_inet nf_flow_table_inet ath11k_ahb(O) ath11k(O) nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mac80211(O) cfg80211(O) tcp_bbr qrtr_smd qrtr qmi_helpers(O) pptp ppp_async nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c l2tp_ppp crc_ccitt compat(O) qca_nss_vlan(O) qca_nss_pppoe(O) pppoe pppox ppp_generic slhc qca_mcs(O) ip_gre gre qca_nss_drv(O) l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ip_tunnel sha512_generic sha512_arm64 seqiv sha3_generic jitterentropy_rng drbg michael_mic hmac geniv cmac leds_gpio qca_nss_dp(O) qca_ssdk(O) gpio_button_hotplug(O) ext4 mbcache jbd2 crc32c_generic
[   46.626449] CPU: 2 PID: 993 Comm: napi/nss-9 Tainted: G           O       6.6.35 #0
[   46.648629] Hardware name: Xiaomi AX3600 (DT)
[   46.656175] Call trace:
[   46.660683]  dump_backtrace+0xa0/0xd8
[   46.662945]  show_stack+0x18/0x24
[   46.666761]  dump_stack_lvl+0x48/0x60
[   46.670061]  dump_stack+0x18/0x24
[   46.673707]  __schedule_bug+0x54/0x6c
[   46.677006]  __schedule+0x4f8/0x584
[   46.680651]  schedule+0x5c/0xc4
[   46.683949]  napi_threaded_poll+0x30/0x84
[   46.687076]  kthread+0x10c/0x110
[   46.691243]  ret_from_fork+0x10/0x20

Is the bug of any relevance?

unfortunately yes :frowning:

1 Like

@qosmio what does kmod-qca-nss-drv-bridge-mgr do?
if this is not compiled in my builds. i get lots of wierd disconnect issues.
eample: ssh to host on a vlan on the ax3600 gets disconnected after a few minutes irregardless of timeout settings on client or host.
with this compiled in. everything works as expected.

I did make clean and recompiled with patches for NAPI (0018-nss-drv-add-napi-threading.patch) present but still cannot see any NSS-X.

Hi @JuliusBairaktaris

But, it is giving worse results, isn't it?

Regards, Agustin

I confirm that using this syntax for VLANs all problems are solved.

1 Like

Yep, I couldn't agree more to this :sweat_smile:

I was successful in compiling immortalwrt/wwan-packages by re-enabling rmnet_nss and applying your patches, using the qmi_wwan_q from there also solved the issue with the IP address.

Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.523432] usb 4-1: new SuperSpeed USB device number 2 using xhci-hcd
Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.706387] qmi_wwan_q 4-1:1.4: cdc-wdm0: USB WDM device
Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.706647] qmi_wwan_q 4-1:1.4: Quectel RG500Q-EA work on RawIP mode
Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.712530] qmi_wwan_q 4-1:1.4: rx_urb_size = 31744
Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.717878] qmi_wwan_q 4-1:1.4 wwan0: register 'qmi_wwan_q' at usb-xhci-hcd.2.auto-1, RMNET/USB device, 06:56:f8:6f:40:4a
Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.723110] net wwan0 wwan0_1: NSS context created
Mon Jun 24 08:01:33 2024 kern.info kernel: [   13.732939] net wwan0: qmap_register_device wwan0_1

5G on my area is really bad right now so I might need to go out and hunt for good 5G coverage :yum:

1 Like

In very basic terms, it manages keeping bridge events in-sync between NSS firmware and the kernel. It's a pretty crucial part of NSS offloading, especially when dealing with VLANs.

Yes, it's related to the threaded NAPI patch. Looks like there's a race condition when accessing shared shared data structures (desc_ring, h2n_desc_ring, etc.) and needs proper spinlocks. That was the initial reason I just reverted the patch.

That's great to hear! So, your connection is fully offloaded?

What is the output of:

nss_stats rmnet_rx

EDIT:

logread won't show anything related to NAPI in threaded mode. If you don't see it in htop or ps, then the patch isn't applied or being built properly.

➤ ps|grep napi
 1395 root         0 SW   [napi/nss-6]
 1396 root         0 SW   [napi/nss-7]
 1397 root         0 SW   [napi/nss-8]
 1398 root         0 SW   [napi/nss-9]
 1399 root         0 SW   [napi/nss-10]
 1400 root         0 SW   [napi/nss-11]
 1401 root         0 SW   [napi/nss-12]
 1402 root         0 SW   [napi/nss-13]
 1403 root         0 SW   [napi/nss-14]
 1404 root         0 SW   [napi/nss-15]
 1429 root         0 SW   [napi/nss-16]
 1430 root         0 SW   [napi/nss-17]
 1431 root         0 SW   [napi/nss-18]
 1432 root         0 SW   [napi/nss-19]
 1433 root         0 SW   [napi/nss-20]
 1434 root         0 SW   [napi/nss-21]
 1435 root         0 SW   [napi/nss-22]
 1436 root         0 SW   [napi/nss-23]
 1437 root         0 SW   [napi/nss-24]

EDIT 2:

To clarify, the nss driver package is already using NAPI (New API) for managing interrupts. The patch is for threaded NAPI. Threaded NAPI is an operating mode that uses dedicated kernel threads rather than software IRQ context for NAPI processing. It's not required for proper functioning, and is experimental in the context of NSS driver.

There are so many factors at play here... It could be the channels used, interference, and most importantly your client device. Client devices will always have lower upload compared to download, since it's the one sending traffic. It's hardware is not as powerful as a dedicated router.

Neither :wink: since you don't have control of closed source firmwares. If your client device is connected at a rate of 1200/1200mbps, what you posted is about as good as you can get it in a perfect setup.

If you're operating in 160mhz, with little to no interference, no DFS scanning, AND have a client that is capable of connecting at 2400mbps/2400mbps then 1600mbps/1500mbps is what you could expect.

Please don't use irqbalance, it's been discussed ad-nauseam in this thread... By default it does not make distinctions on which IRQs to move, only "looks busy" and moves it. For example the ce* IRQs DO NOT like being moved off CPU0, doing so will cause instability and crashing.

It is best to pin the IRQs to respective cores, and leave it. A lot of tuning has been done to ensure optimal spread between CPUs.

1 Like

I got some problem though, for some reason the 2.5g WAN/LAN port on my Arcadyan AW1000 is broken again, interface is there but won't go up when plugging a cable to it.