Qualcommax NSS Build

When updating the feeds, do I need to use -f to force the override?

i am unsure why you are getting this message but yes use "-f" ...

EDIT ... have you changed the repo to bitthief on a existing folder that had the openwrt main branch ? if so suggest you clone into a new folder bitthief's repo ...

I cloned bitthief's repo (into brand new folder)
cd openwrt
./scripts/feeds update -a
./scripts/feeds install -a

Then I got the warning above. I figured that '-f' will overwrite what's in bitthtief's repo so I left it as is.

EDIT: I build it and installed just fine. How can I confirm if I'm using NSS or not?

Here's output from dmesg:

  3.607976] Generic PHY 90000.mdio-1:00: attached PHY driver (mii_bus:phy_addr=90000.mdio-1:00, irq=POLL)
[    3.611776] nss-dp 3a001000.dp1 lan4: Registered netdev lan4(qcom-id:1)
[    3.621626] Generic PHY 90000.mdio-1:01: attached PHY driver (mii_bus:phy_addr=90000.mdio-1:01, irq=POLL)
[    3.627739] nss-dp 3a001200.dp2 lan3: Registered netdev lan3(qcom-id:2)
[    3.637763] Generic PHY 90000.mdio-1:02: attached PHY driver (mii_bus:phy_addr=90000.mdio-1:02, irq=POLL)
[    3.643908] nss-dp 3a001400.dp3 lan2: Registered netdev lan2(qcom-id:3)
[    3.653892] Generic PHY 90000.mdio-1:03: attached PHY driver (mii_bus:phy_addr=90000.mdio-1:03, irq=POLL)
[    3.660025] nss-dp 3a001600.dp4 lan1: Registered netdev lan1(qcom-id:4)
[    3.875514] QCA808X ethernet 90000.mdio-1:1c: attached PHY driver (mii_bus:phy_addr=90000.mdio-1:1c, irq=POLL)
[    3.875954] nss-dp 3a007000.dp6-syn wan: Registered netdev wan(qcom-id:6)
[    3.884505] **********************************************************
[    3.891262] * NSS Data Plane driver

I get that message too. I've always defaulted to:

./scripts/feeds update -a -f
./scripts/feeds install -a -f

To check that NSS is running:

Fri Jan 27 22:42:39 2023 user.info kernel: [    8.078436] kmodloader: loading kernel modules from /etc/modules.d/*
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.221060] nss_driver - fw of size 827584  bytes copied to load addr: 40000000, nss_id : 0
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.222932] Supported Frequencies -
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.222949] 187.2 MHz
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.228236] 748.8 MHz
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.232047] 1.6896 GHz
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.234221]
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.238959] ffffffc000b1d480: set sdma ffffff8002bd5c00
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.240711] ffffffc000b1d480: meminfo init succeed
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.277329] node size 2 # items 4
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.277364] memory: 40000000 1073741824 (avl 923385856) items 4 active_cores 2
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.279645] addr/size storage words 2 2 # words 4 in DTS, ddr size 1000000
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.286776] ffffffc000b1d480: nss core 0 booted successfully
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.295191] nss_driver - fw of size 295196  bytes copied to load addr: 40800000, nss_id : 1
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.300375] Supported Frequencies -
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.300389] 187.2 MHz
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.307526] 748.8 MHz
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.311310] 1.6896 GHz
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.313480]
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.318233] ffffffc000b24cc0: set sdma ffffff80043d0c00
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.319948] ffffffc000b24cc0: meminfo init succeed
Fri Jan 27 22:42:39 2023 kern.err kernel: [    8.325026] debugfs: Directory 'dynamic_if' with parent 'stats' already present!
Fri Jan 27 22:42:39 2023 kern.err kernel: [    8.329734] debugfs: File 'n2h' in directory 'strings' already present!
Fri Jan 27 22:42:39 2023 kern.err kernel: [    8.337275] debugfs: File 'drv' in directory 'strings' already present!
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.347838] node size 2 # items 4
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.350223] memory: 40000000 1073741824 (avl 923480064) items 4 active_cores 2
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.353702] addr/size storage words 2 2 # words 4 in DTS, ddr size 1000000
Fri Jan 27 22:42:39 2023 kern.alert kernel: [    8.360821] ffffffc000b24cc0: nss core 1 booted successfully
1 Like

You may want to try the package from here:

https://github.com/asushugo/lede/tree/master/package/qca/nss

warning though ... one will need updated patches to this repo and updated kernel patches for ipq8074

Someone suggested in another thread that NSS build does not crash (ath11k crashing) as often as OpenWRT snapshot. At least in my experience - it still does - which makes sense since the issue is apparently with v2.5 firmware.

I have been running the following NSS build since December without ath11k issues (CAUTION: IT STILL USES THE OLD PARTITION LAYOUT. DO NOT FLASH IT IF YOU ARE USING THE NEW LAYOUT): https://github.com/rodriguezst/openwrt/releases/tag/bitthief-nss-2022-12-28-1316
Uptime of 35 days:

2 Likes

How is the NAT speed on this build? WAN - LAN.

Upd. I got about 350 Mbit with official and NSS builds both.

I get the fulll speed of my provider. Same as with official builds (~900mbit/s).

Hmm, interesting. I have ~800 Mbits on WiFi and low speed with ethernet cable. Have you ever tested the second case?

This is my speedtest results connected via ethernet:

Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by *** [1.00 km]: 2.922 ms
Testing download speed................................................................................
Download: 844.05 Mbit/s
Testing upload speed................................................................................................
Upload: 876.07 Mbit/s
1 Like

Thanks. I'll try to find an issue in my environment.

wifi

lan-wan
image

note that if you are using nss + ecm for my use case on a qnap; the dynalink is using the vanilla openwrt build

I found disabling sofware / HW offloadig + disable packet steering + do not use irqbalance having good results.

I use the ondemand governor setting the following thresholds on rc.local

echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 60 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

on top I have the following on rc local

echo "229376" > /proc/sys/net/core/wmem_default
echo "229376" > /proc/sys/net/core/wmem_max
echo "229376" > /proc/sys/net/core/rmem_default
echo "229376" > /proc/sys/net/core/rmem_max

#for q in $(ls /sys/class/net/eth*/queues/rx-*/rps_cpus); do echo f > $q; done
for q in $(ls /sys/class/net/10*/queues/rx-*/rps_flow_cnt); do echo 4096 > $q; done
for q in $(ls /sys/class/net/10*/queues/tx-*/xps_cpus); do echo f > $q; done
for q in $(ls /sys/class/net/lan*/queues/rx-*/rps_flow_cnt); do echo 4096 > $q; done
for q in $(ls /sys/class/net/lan*/queues/tx-*/xps_cpus); do echo f > $q; done
echo 32768 > /proc/sys/net/core/rps_sock_flow_entries

sysctl -w net.ipv4.tcp_rmem='65536 262144 8388608'
sysctl -w net.ipv4.tcp_wmem='65536 262144 8388608'
sysctl -w net.ipv4.tcp_mem='65536 262144 8388608'
#sysctl -w net.ipv4.tcp_window_scaling=3
sysctl -w net.ipv4.tcp_low_latency=1
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.ipv4.tcp_dsack=1
sysctl -w net.netfilter.nf_conntrack_max=8192
sysctl -w net.core.somaxconn=8192
sysctl -w net.core.optmem_max=81920
sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_max_tw_buckets=262144

echo 1689600000 > /proc/sys/dev/nss/clock/current_freq

echo 2048 > /proc/sys/dev/nss/n2hcfg/n2h_queue_limit_core0
echo 2048 > /proc/sys/dev/nss/n2hcfg/n2h_queue_limit_core1

there is a few more that will need to be added to the above if using the 2.5g/10g ports

Still very high latency which hopefully can be mitigated when nss-sqm is fixed ...

I am using a 1.2G isp connection with mwan3 load balancing two ethernet ports on the qnap

hope the above is useful

2 Likes

Hey,

Super glad to see that my code/builds are working well for you guys!

Just wanted to give you a heads-up that I ported and pushed all the remaining bits for NSS SQM since yesterday morning, I hope I picked up everything that's needed. If you could test SQM and configuring the NSS qdiscs, it'd be amazing.
I will play with them over the next few days also, but my crappy ISP which doesn't give me the 1GBps that I pay for, 3 WireGuard tunnels for 6 Wi-Fi APs, IPv6 and NAT masquerading, VLANs and the mwan3 split routing.. require a bit of planning ahead and calculating before adding SQM into the mix, especially given the fact that I would like to ideally use as much bandwidth as I got, since why I worked my ass off to get NSS running in the first place.

What's been added on top of the latest sync with openwrt master branch:

  • nss-ifb
  • qdisc client for qca-nss-clients
  • qdisc support in qca-nss-drv
  • iproute2 - patches for the nss qdisc and nss cake codel etc. algos
  • nat46 patches to allow it to be used by ECM (required by the MAP-T module)
  • ECM support for nat46 / MAP-T and multicast acceleration
  • qca-mcs module/package in the repo etc.
  • reworked all the kernel patches structure and ordering convention, so make sure you don't have overlapping ones when you pull the repo locally (just watch out for the same patches being duplicated in the target/linux/ipq807x/patches-5.15 folder)
  • misc. fixes / tweaks, such as the smp_affinity init.d script, but that one is missing a chmod executable bit / 755 on my device, after that it works and it becomes visible in uci / luci, I'll fix that when I commit something else.

I have been running the new code with everything enabled since yesterday morning, super stable as usual, no issues, no memory leaks etc. Performance wise, I'm maxing out my ISP provided bandwidth, around 600-650 mbps, and 350-400 mbps for each wireguard tunnel.
Wi-Fi peaks around 1.2GBps and ultra stable on 160mhz / 19dbi.

ECM is used (confirmed with ecm_dump.sh), especially for wired connections. I have a pihole and it makes a huge difference for DNS traffic on my network. Before ECM, I'd have tons of failed SSL handshakes etc., that's why I started porting it and forking @robimarko's code in the first place tbh, it was literally a necessity. All wired traffic on my network eventually has random SSL handshake failures without it.

A lot of the new patches and packages come from @ACwifidude, @qosmio and @Ansuel , their repos and IPQ806x NSS work is simply amazing and it helped so much! I spent a lot of active hours in the past few days simply figuring out all the parts that are needed and putting them together for our build, ensuring everything works and is nicely integrated, while keeping things simple enough so I don't go crazy when they need to be updated for kernel 6.1 or whatever happens next..

I am also cherrypick-ing some of @robimarko's recent qca-nss-drv and nss-crypto work, as you might have seen, thanks a lot for the crypto port! Super cool to see it in there, even if it has partial issues for now, it's a good starting point for us to move forward and basically get on the same level of features found in stock builds / QSDK! Haha, Qualcomm should put us on their payroll for porting all their QSDK crap to the latest kernels, or at least donate a few devices!

In terms of future work, I wanna cool down for a few days at least, let this build stabilize and figure out what needs to be done next. I only have one AX3600, which is also my production / daily driver device in the house, so it's a bit hard and annoying to work on debugging kernel stuff, especially which crashes.. and initramfs boots never worked on my device, so I actually need to flash to test.
I was looking at how fw4 / nft and the kernel implement nftables hardware offload last night, ultimately found the driver glue code and started thinking what we'd gain and if it's worth it writing something similar to integrate it with the other NSS stuff: https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
If we'd only need a similar driver, it's literally a few hundred lines of code, straightforward to do, and this would be reused by all the other IPQ / QCA NSS devices, not sure how much performance gain would be though and if it's worth the effort or we actually need it and it's not offloaded already?

Another cool thing I was thinking of is to add ChaCha1305 / Poly (the WireGuard ciphers) support in nss-crypto and nss-cfi, however I'm getting 400mbps already with public VPNs, so not sure we'd gain much more, or actually lose performance by throwing all this extra processing to those NSS cores.
I gotta find a new contract / job / work in real-life (heh, gotta pay the bills), but I will try to debug and fix the null pointer issues with nss-cfi / crypto as a starting point, it's most likely something stupid like the ECM crash / the WARN_ON_ONCE bug that took ages to find and fix..

Stay away from the Chinese crap forks like lede and the one @Cypher1 posted above, they're rip-offs and half-broken stuff etc., just look at their commits and you'll understand they're junk.

And sorry for the long post!

18 Likes

Hehe, NSS SQM is fixed, see my above post and try it out!

Also, thanks for those sysctl's and stuff, I need to check what values I have also.

The rps/xps part shouldn't be needed:

I also added this instead of irqbalance, however I use both on AX3600 with no issues:

There's a patch for a 2.2GHz CPU overclock, we had it before, I'll have a look at adding it again.

I always use the performance CPU governor, we had this debate ages ago on the other AX3600 thread, I think ondemand was the most problematic one? But each setup is different, and if it works for you, then it works hehe.

I got the following sysctl buffers:

Buffers

net.core.rmem_default = 256960
net.core.rmem_max = 513920
net.core.wmem_default = 256960
net.core.wmem_max = 513920
net.core.netdev_max_backlog = 2000
net.core.somaxconn = 2048
net.core.optmem_max = 81920
net.ipv4.tcp_mem = 131072 262144 524288
net.ipv4.tcp_rmem = 8760 256960 4088000
net.ipv4.tcp_wmem = 8760 256960 4088000
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 1025 65000
net.ipv4.tcp_max_syn_backlog = 2048

WireGuard

net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.all.rp_filter = 2

Low latency

net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_mtu_probing = 1

Might need to adjust them, I see you got higher values for some.

I use this on my Gentoo Linux VM kernel, among other patches:

https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency/
Might be worth looking if it's not in OpenWRT, adding it if not and benchmarking?

6 Likes

Thanks for your hard works!!
Can you please share your build .config? I want to strictly follow your build, it suits my needs. Thanks

thank you very much @bitthief very good stuff and what a nightmare of a merge/fix you had to go through!..good luck with the job hunting... regarding qdisc I remember one has to use nss-ifb .. did you manage to create two ifb devices noting you use mwan3 ?

3 Likes

fyi --- regarding the nssinfo utility ... in order to work one has to insmod/modprobe qca-nss-netlink

2 Likes