Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

You might be interested in this :smiley:
There's also another commit that added workqueue for tx processing. throughput was better when I tested
git.openwrt.org Git - openwrt/staging/nbd.git/commit

2 Likes

Yup. Was reading up on the code on threaded NAPI. It seems that we only need to set the net_device data structure's threaded bit to enable threaded NAPI for a device, when adding the interface's NAPI poll function.

I also noticed that there is a kernel API dev_set_threaded(struct net_device *dev, bool threaded) that maybe we should use it instead of just setting the threaded variable in the data structure.

Anyone game to try for ath10k using flent?

2 Likes

Stray thought - what would be the UI knob or interface to enable or disable this.?

Great find!

There're no UI knobs for this tho. The only configuration exposed are via the debug fs file.

ath10k: enable threaded napi on ath10k driver - Patchwork (kernel.org)
Someone had another patch that used the API you mentioned

3 Likes

I do wonder tho. if enabling threaded NAPI would make the issue with the L2 cache clock scaling panics more prominent, since threads can be scheduled on different CPU cores, which then necessitate the copying of L1 cache lines from L2, and different CPU cores may be running at different clock rates.

1 Like

Can't answer that myself.
Never hit that issue even with threaded NAPI :sweat_smile:

Probably @dtaht has something to add here as he and @nbd both took part in that discussion.
But why this wasn't enabled so far or at least tested. What about the ath10k-ct driver. Does it have threaded NAPI enabled.

@qosmio
I have created PR for openwrt-ipq806x, nss-packages.
compille support for l2tpv2.
add support NEC WG2600HP.
lock this cpu in qca-nss-ecm, qca-nss-clients.
Please review them.

tunipip6 (DS-Lite) does not offload yet.

2 Likes

@everyone testing my 5.15 branch. Just got my laptop repaired, going through all the backlog comments. So far it seems the new krait-cc patches have been causing issues for everyone. I ran into 3 crashes over the last 2 weeks weeks as well. Just updated my branch 5.15-qsdk11-new-krait-cc with @ansuel's most recent patches (6 days ago). But looks like consensus on that is about the same as before, though some show longer uptime? Going to test it myself as well... let's see :crossed_fingers:

@tishipp Thanks! Just merged your PR.

4 Likes

Thanks to your comments I have been investigating what problem there could be in my network so that it gave such horrible results in the Flent tests and the loss of speed that exists with the iperf3 test that although it is not appreciated when both computers are connected by cable, it present when one of them connects via Wi-Fi to the r7800.
And the loss is 10% in speed and a significant increase in latency.

The problem is not the switch but the almost 40 meters of cable and its connections that exist from the router to the server.

This is why I have to apologize to those who have read and tried to understand (@dtaht, @amteza, @vochong...) the data that I have provided and that I have even tried to refute at some point.

But what is true, in my opinion, is that the intel ax210 client doesn't work as it should with this version of ath10k-ct (I think the same thing happens with the ath10k version) when it comes to uploading data to the router since it somehow saturates the network causing a large increase in latency.

I have had to set up a new server connected directly to the router and you can clearly see the improvement of the data with respect to the graphs previously uploaded by me.

firmware: qosmio k5.15qsdk11 with amteza' "NAPI_POLL_WEIGHT: 8" patch

the first 3 images correspond to AQL 5000/12000
the last 3 to AQL 2000/2000

EDIT: all flent data





PD: The firmware I've been using for these tests has had no crash or reboot issues during the time I've been using it, and it has had periods of more than 24 hours without shutting down.

@qosmio

latest new-krait-cc build error :

  CC [M]  /home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1/net/mac80211/agg-rx.o
/home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1/net/mac80211/agg-rx.c: In function 'ieee80211_send_addba_resp':
/home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1/net/mac80211/agg-rx.c:255:17: error: 'struct sta_info' has no member named 'mesh'
  255 |         if (!sta->mesh)
      |                 ^~
make[8]: *** [scripts/Makefile.build:289: /home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1/net/mac80211/agg-rx.o] Error 1
make[7]: *** [scripts/Makefile.build:552: /home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1/net/mac80211] Error 2
make[6]: *** [Makefile:1898: /home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1] Error 2
make[5]: *** [Makefile.build:13: modules] Error 2
make[4]: *** [Makefile.real:93: modules] Error 2
make[3]: *** [Makefile:121: modules] Error 2
make[3]: Leaving directory '/home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1'
make[2]: *** [Makefile:574: /home/username/works/openwrt-r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/backports-5.15.58-1/.built] Error 2
make[2]: Leaving directory '/home/username/works/openwrt-r7800/package/kernel/mac80211'
time: package/kernel/mac80211/regular/compile#42.14#10.87#53.93
    ERROR: package/kernel/mac80211 failed to build (build variant: regular).
make[1]: *** [package/Makefile:116: package/kernel/mac80211/compile] Error 1
make[1]: Leaving directory '/home/username/works/openwrt-r7800'
make: *** [/home/username/works/openwrt-r7800/include/toplevel.mk:231: package/mac80211/compile] Error 2

Any info on how to fix above?

[EDIT]
Fix : Need to enable : CONFIG_PACKAGE_MAC80211_MESH

1 Like

This is fixed in master already:

1 Like

You shouldn't have to enable it. I merged the change @th3voic3 mentioned from master into 5.15-qsdk11 and 5.15-qsdk11-new-krait-cc

1 Like

Sorry for not replying to this thread sooner (I was at the wispalaloosa conference) - thx all for finding new ideas and patches.

I have very limited ability to hack directly on hw at the moment, living on a power constrained boat with no funding. I sold off the gear I used to use for aircaps (there's plenty else wrong in the management frame layers)... I have an mt76 box on board, that's it. I appreciate y'all leaping in to do better tests.

Me, I'd rip out napi entirely from the ath10k, if I could. Threading napi would help but it's just adding more work that you have to do, just later. Doing rx and tx separately would help more methinks.

As per the discussion on the git pull request, I'd asked for better tests of the threaded idea, and got none.

1 Like

The tcp_ndown test shows good latency and throughput, so that part of things is working well. The ax210 upload test is AWFUL, and looking more directly into that is needed. The rrul test is a reflection of that although the ath10k should be able to do about half in this case, and isn't because (IMHO) reads are starving writes on the current structure of the ath10k driver.

It's not "saturating the network" on the rrul test as completely overwhelming tcp's congestion control algorithms by having so much buffering in the ax210. It too, should have essentially less than 20ms,
and doesn't, and I have not poked into that driver at all. I did pick up an ax210 card since we've isolated that to be a problem in and of itself, regrettably I lack a laptop that has the m.2 slot in it to even try it. But I'll start poking through that driver to see what I can see in it.

It's kind of like the sith, when it comes to wifi. It's always two...

2 Likes

If I had any one goal it was to reduce the induced and fixed latency and jitter below what is required for a single videoconferencing/voip/video frame (16ms) to be processed with 4 stations active. (rtt_fair testing is needed for this) I don't mind (presently) taking a small bandwidth hit to do that, and it's very good to know the 2k AQL setting is (presently) needed to get there. There are many other possible means to get there besides hard coding these changes, and I still have hope that we can improve the tx/rx disparity greatly along the way.

And thx, also, @vochong and everyone for continuing to wack away at this whilst I was away.

1 Like

Do you have any history on why ath10k is done with rx/tx performed in the same thread? It seems weird that they have done a pull/push model driver.

1 Like

I was primarily involved in the ath9k side. Ben Greer and Michael Kazior (who later went onto plume), were ath10k-side. nbd was mt76-side.

1 Like

Has anyone looked at linux 'kernel hacking' option 'Force round-robin CPU selection for unbound work items' ?

make kernel_menuconfig, last option is 'kernel hacking' ...

I haven't seen anything here about benefits/costs to using this on dual-core, much less our dual-core arm.

I have tried this and (along with an unsorted mess of other changes) see some improvement with iperf3 over my lan.

I'm building @ACwifidude's kernel5.10-nss-qsdk10.0 master as a baseline, and then I'll rebuild it with that single option enabled.

Thoughts?

2 Likes