IPQ806x NSS Drivers

reka · January 7, 2021, 5:30pm

The NSS cores have their own supply and apparently no OPPs so I don't think the reboots are related to them operating at too low a voltage. Unless of course their power regulator is also not working as intended.

Ansuel · January 7, 2021, 5:50pm

If the cores are well split... a quick fix would be to try to make the nss core run at the normal freq that should be 600mhz....
In theory the bootloader should set the nss regulator to 1v by default that is the voltage for the 600mhz freq.
1.15 is for 800mhz.

Quick change. Does anyone test this?
But also the problem could be related to the cache that doesn't scale well with the regulator.

keithspg · January 7, 2021, 7:00pm

@ACwifidude Is this required of OEM firmware? It has no issues with 1G, does it? It feels a bit extreme to me.

Ansuel · January 7, 2021, 7:02pm

having max voltage doesn't harm anything and would make the problem clear... the other option is slow down the router and make it run to the very basic clock (that would compensate the lack of voltage)

facboy · January 7, 2021, 10:21pm

i did load this on my R7800, not happy. wireless doesn't work, log is showing the firmware is crashing.

quarky · January 8, 2021, 8:39am

I'm not too sure what nDPI does in relation to how connections between the router's client and the server is being managed. If it is TCP (and maybe UDP) based, it should work even with the NSS firmware is used, assuming the ipq6018 NSS firmware behaves the same as the ipq806x NSS firmware.

The NSS firmware (for the ipq806x at least) takes over TCP/UDP connection/streams once connection has been established. If iptables drops the connection (from your iptable command) even before it is established, the NSS firmware will be 'blind' to it and so will not manage the 'connection'.

With my R7800 using NSS offloading, I am able to use the iptables mangling for policy based routing to route certain device taffic over VPN for both TCP and UDP.

I have not tried using external modules like nDPI tho. But I think nDPI is probably not compatible with NSS. If my understanding is correct, deep packet inspection will need to scan quite a few packets of established connections before it can tell what type of traffic is flowing in those connections. Once netfilter establised a connection, NSS will see it and starts managing it. Once NSS takes over, the Linux stack will be blind to it. That means nDPI will not see any packets from any connection/streams that NSS took over.

So if nDPI depends on netfilter to send it packets for inspection before processing it in netfilter, this will not work when NSS takes over.

HTH.

lantis1008 · January 8, 2021, 11:42am

This is my understanding as well and a good explanation.
There are some nDPI signatures that will still work (as it only requires the initial connection + reply to verify), but the vast amount of signatures will not work.

You might be able to write rules in a way that stops them going through NSS until they have been properly classified?
NDPI_PROTOCOL_UNKNOWN is flagged until the signature is verified as best i understand it.

chirayu-patel · January 8, 2021, 1:24pm

Thanks a lot for the explanation.. I understand now

Good for performance but bad for doing content filtering ..

Anyways, i will also raise it to the nDPI team .. if they have some idea..

facboy · January 8, 2021, 10:42pm

i removed the sta block from the offload path (ie left it where it was in ath10k-5.8 originally), wireless works now. only thing is the Luci UI seems to be broken, in the 'Wireless Overview' at the top everything seems to be blank.

below it can see the attached clients and their signal strength and bitrate, so not entirely sure what is going on there. i think i never ran NSS when the wireless offload was enabled on ath10k-5.4, was Luci always broken?

EDIT: i should add, not entirely sure how to tell if the offload is really working. FWIW doing a basic speedtest.net with my 150M DL consumed about 40-45% SIRQ on one cpu, which doesn't seem great.

Ansuel · January 8, 2021, 10:53pm

the broken luci has been fixed. not related

ACwifidude · January 8, 2021, 11:58pm

I’ll try it out this weekend. I have a gig line so it should give ~560mbps on a speedtest (similar to prior).

quarky · January 9, 2021, 12:16am

From what I know, quite unlikely. Once traffic flows thru netfilter, NSS ECM will be notified. ECM will then signal NSS to manage the connection.

One way is we can delay ECM from notifying NSS to start taking over, say like after 32-64 packets have flowed thru. I saw such a patch from QSDK for later version, but should be able to back port to QSDK 10. If the nDPI team invalidates the netfilter connections for those that matches the drop rule, it should work nicely with NSS.

darksky · January 9, 2021, 11:30am

I think NSS is bound to one CPU:

# cat /proc/interrupts
           CPU0       CPU1       
 16:    8802827    7288951     GIC-0  18 Edge      gp_timer
 18:         53          0     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Level     ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 30:     166414       3233     GIC-0 202 Level     adm_dma
 33:  239868258          0     GIC-0 245 Level     nss
 34:          0    2598115     GIC-0 264 Level     nss_queue1
 35:  225207331          0     GIC-0 246 Level     nss
 36:          0          0     GIC-0 265 Level     nss_queue1
 37:          0          0     GIC-0 130 Level     bam_dma
 38:          0          0     GIC-0 128 Level     bam_dma
 40:          0          0   PCI-MSI   0 Edge      aerdrv
 42:          0          0   PCI-MSI 134217728 Edge      aerdrv
 43:         13          0     GIC-0 184 Level     msm_serial0
 44:          2          0   msmgpio   6 Edge      keys
 45:          2          0   msmgpio  54 Edge      keys
 46:          2          0   msmgpio  65 Edge      keys
 47:     219753          0     GIC-0 142 Level     xhci-hcd:usb1
 48:          0          0     GIC-0 237 Level     xhci-hcd:usb3
 49:   38415457          0   PCI-MSI 524288 Edge      ath10k_pci
 50:         27          0   PCI-MSI 134742016 Edge      ath10k_pci
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:     166190   18535507  Rescheduling interrupts
IPI3:       9507        363  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:        603        532  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

chirayu-patel · January 10, 2021, 6:14am

That's right.. This diagram explains it nicely

So what I understand is , once the postrouting hook is called and ecm decides that the connection can be accelerated, NSS will surely take over and then there is no going back.. Linux stack will then have no idea about it.

chirayu-patel · January 10, 2021, 6:32am

What I am failing to understand is that if I drop the traffic in prerouting chain itself, ideally it should never reach the postrouting and hence should never be taken over by NSS.. isnt it ?

But still as per my observation, after momentary drop of traffic , all of a sudden the traffic starts passing through.

Also, I got this reply from the owner of the ndpi repo that I am using -

Do not use "NSS" if you need to use complex filtering options.
Until iptables/nftables has the ability to enable/disable NSS processing of the connection, we will not get anything good from NSS.
NSS has its own limited scope - simple filtering at the L2/L3/L4 level. As soon as you need to filter data at the L7 level, it immediately turns from a useful feature of NSS into a problem.

quarky · January 11, 2021, 1:05am

In a nutshell, your understanding is correct. ECM is the one that triggers the NSS firmware to start managing TCP/UDP connections. All netfilter's actions will be sent to ECM as ECM hooks itself into the netfilter notification chain. Any changes to netfilter's connections will be made known to ECM. ECM in turns decides how NSS manages NSS firmware's acceleration actions. So you can say that ECM is the entire brain of the NSS firmware network acceleration.

The problem with nDPI, wrt to NSS, (at least from what I understood) is that the packet drops happens much later into the connection, when nDPI has enough information to determine whether to continue with the connection or to drop it during pre-routing (if iptables is used to configure the drop in the pre-routing chain). By this time, I imagine quite a few packets has flown thru netfilter's post-routing chain, and ECM would have been notified of said connections. I guess what you have encountered is that the drop happens very earlier on during connection setup and nDPI is still getting TCP/UDP packets from netfilter, after it has determined the traffic type. But then netfilter already has the connection created. ECM will then notify NSS to take over, and traffic starts flowing again.

One way to check is that with nDPI working, do a 'cat /proc/net/nf_conntrack' and see if the nDPI dropped connection is still showing up. If it is showing, then ECM will know of this connection, and NSS will take over eventually.

The nDPI folks is correct with their explanation of the NSS scope, being mainly operating in L2/L3/L4. So I think for NSS to work with nDPI, we need to delay ECM's trigger to NSS, and when nDPI decides to drop any connection, said connection has to be flushed from netfilter's connection tracking database. If such a scenario can be achieved, NSS can be used with nDPI. I imagine nDPI would have to create a kernel module to flush the netfilter connection, instead of solely being at the L7 layer.

There could be other solutions but at the moment I can't think of any. If anyone has alternative solutions, it'll be great to hear about it.

HTH.

chirayu-patel · January 11, 2021, 4:24am

Can you please point me to the patch that you are suggesting here. I am already using the latest QSDK version - spf11.2/spf11.3. I will just check if this patch is already present in this version or not..

Thanks!!

chirayu-patel · January 11, 2021, 4:25am

Thankyou for the explanation .. Appreciate it !!

quarky · January 11, 2021, 4:45am

This is the patch I was referring to:

https://source.codeaurora.org/quic/qsdk/oss/lklm/qca-nss-ecm/commit/?h=NHSS.QSDK.11.3&id=1de80f85fb49f9d90b20916e848bc3a4754b08fb

chirayu-patel · January 11, 2021, 5:02am

Thanks a lot.. !!