Adding OpenWrt support for Xiaomi AX3600 (Part 1)

wwortel · August 14, 2021, 9:28am

my concern is that in a system where all interfaces are supposed to be separate with their own ip subnets, it causes problems when different interfaces have identical mac addresses. How can the routing software keep them apart? But I may be wrong as internally perhaps the interfaces are distinguished by other means than their mac address.
Confusion with other devices is unlikely as I just assigned manually different values to the last byte, like is the case with the radio interfaces.

robimarko · August 14, 2021, 9:41am

The switch will handle the forwarding, MAC-s can be the same and have been for almost all of the multi port devices that have an actual switch in OpenWrt.

hgblob · August 14, 2021, 11:41am

About the leak:

a few(~2) or no devices connected to the ath11k radio makes it go crazy with memory usage
a lot of devices(~20) plus the 512MB profile patch keeps it stable at 60%

It's not necessarily a leak because if I keep a couple device connected for an hour the memory usage goes up to 80% but if I connect more devices then the memory usage goes back down to 60%.

I have some weird build(many patches swiped from the chinese fork) that has been rock solid with around 20 devices connected to ath11k but @dchard with 2 devices connected gets the OOM after max 2 days.

I guess there is no memory leak per-se but the driver seems to be allocating too much memory that only gets released when there are many devices connected.

Crect · August 14, 2021, 11:55am

Can anybody confirm that hiding ESSID really has an impact on memory usage as outlined by @psi-c?

wwortel · August 14, 2021, 1:11pm

just tried it here, three radios as AP and all three 'option hidden 1'.
firmware: today's 'restart' with all 'lede' originating patches added to package/kernel/mac80211/patches/ath11k and associated changes that the 512M profile patch requires.
No load from traffic, other than a ssh link via lan3 to see what is going on.
When not 'hidden' the MemFree slowly reduces to 247MB coming from about 280M and then stabilizes.
When 'hidden' this reduction does not take place and MemFree fluctuates close to 279 MB.

psi-c · August 14, 2021, 1:43pm

The image i used yesterday had a few debugging things turned on which themselves were eating RAM. I built a clean image and booted into that yesterday.

Straight after boot

              total        used        free      shared  buff/cache   available
Mem:         375804      138876      199292       30608       37636      178232
Swap:             0           0           0

After 30 minutes
              total        used        free      shared  buff/cache   available
Mem:         375804      144348      193820       30608       37636      172760
Swap:             0           0           0

After 2 hours, free down by 22MB but that's accounted for by buff/cache increasing by the same.

              total        used        free      shared  buff/cache   available
Mem:         375804      144716      172024       30612       59064      161676
Swap:             0           0           0

Now up for 21 hours and still the same as before

              total        used        free      shared  buff/cache   available
Mem:         375804      144532      172252       30612       59020      161880
Swap:             0           0           0

This image is robiarko's AX3600-5.10-restart (forked last week) with my patches for including bridge-mgr (see Roaming Issues Xiaomi AX3600 - #81 by psi-c). No wifi clients, just two devices connected via ethernet.

Not sure if there was any doubt about it being an ath11 problem, but I activated the IOT antenna this afternoon, broadcasting SSID whilst the ath11 was still hidden (previously I had the IOT disabled). Free memory decreased by about 20MB but has stayed stable over the last 11 hours so doesn't seem to be a problem there.

dchard · August 14, 2021, 2:34pm

I tried to put the 512M patch to the packages/kernel/mac80211/patches/ath11k folder then run make packages/kernel/mac80211/refresh V=s, but every time the patch gets deleted from the folder. What am I doing wrong?

hgblob · August 14, 2021, 2:50pm

I'm not very familar with the openwrt build system, but I would have thought the right command is make packages/kernel/mac80211/compile V=s

And in any case if everything else fails just clean and build again everything.

dchard · August 14, 2021, 3:15pm

The patch is still not applying. At least it stays in the ath11k folder this time. There is no error, it just simply not applies.

wwortel · August 14, 2021, 3:38pm

the #998 512M patch in mac80211 is part of a set of patches that add and change .dts files (reserved memory settings) and ath.mk (introduction of a configuration flag). The set can be seen here: https://gitce.net/mirrors/lede/commit/6967bf73f076826e9a6a6891ff204e4f4fdd90cd
The mac80211 patch later got renumbered from 207 to 998 but won't work in splendid isolation.

kirdes · August 14, 2021, 4:46pm

@robimarko / @Ansuel

blogic has commited a new branch to the tip repo including 11.4 nss binaries.

Readme.text has not been updated, but the bins reporting Version: NSS.HK.11.4.0.5-5-R

Unfortunately the QUIC repo still only contains the 11.3 binaries.

Maybe we can now add the 11.4 binaries to the nss-packages repo (due to the public availability of the bins)?

hgblob · August 14, 2021, 7:06pm

@robimarko I discovered the reason for my network packet corruption. Seems without the tx path acceleration my I cannot really access the internet over ethernet.

After a bit of experimentation I noticed that these two patches are enough to enable acceleration of the nss tx path and fix my corruption issue: https://github.com/hgblob/openwrt/commit/2455c085eee1ecec43481cff490f7ec98fad512a

And of course to enable CONFIG_NF_CONNTRACK_CHAIN_EVENTS=y in nss-ecm.

What is your opinion of these patches? I have no idea of their real source, I just got them from the chinese fork with a whole lot of other patches and then started bisecting.

hgblob · August 14, 2021, 7:07pm

Did you try to clean and start from scratch: make clean && make all ?

kamkom · August 14, 2021, 11:01pm

Hi, does all devices connected to gigabit ports have the shared throughput like in ax6000 or ax3600 switch is connected in different way and give us more throughput?

dchard · August 15, 2021, 10:29am

Can you elaborate a bit more? In Robimarko's branch both tx and rx path acceleration works on the wired interfaces with both IPoE and PPPoE modes. I see 0% load when I do speedtest in both the WAN or LAN domain.

hgblob · August 15, 2021, 11:36am

Should have been more clear, I was talking about the ipv4 acceleration. In robi's branch in the nss stats only the rx packets are counted(and I get my corruption thing) with the above patches both paths are counted.

You can check in nss debug:

root@OpenWrt:~# cat /sys/kernel/debug/qca-nss-drv/stats/ipv4

________________________________________________________________________________

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< IPV4 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
________________________________________________________________________________

        ipv4_rx_pkts           = 6612485         common
        ipv4_rx_byts           = 7565932223      common
        ipv4_tx_pkts           = 413882          common
        ipv4_tx_byts           = 517088193       common
        ipv4_rx_queue[0]_drops = 0               drop
        ipv4_rx_queue[1]_drops = 0               drop
        ipv4_rx_queue[2]_drops = 0               drop
        ipv4_rx_queue[3]_drops = 0               drop

This is how it looks for me when both are what I assumed to be processed by the NSS.

dchard · August 15, 2021, 11:47am

This is Robi's branch, only the 512M patch is applied, nothing else:

root@XAX6:~# cat /sys/kernel/debug/qca-nss-drv/stats/ipv4

________________________________________________________________________________

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< IPV4 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
________________________________________________________________________________

        ipv4_rx_pkts           = 2718231         common
        ipv4_rx_byts           = 2919466237      common
        ipv4_tx_pkts           = 2425734         common
        ipv4_tx_byts           = 2664666702      common
        ipv4_rx_queue[0]_drops = 0               drop
        ipv4_rx_queue[1]_drops = 0               drop
        ipv4_rx_queue[2]_drops = 0               drop
        ipv4_rx_queue[3]_drops = 0               drop

root@XAX6:~# cat /sys/kernel/debug/qca-nss-drv/stats/pppoe

#pppoe base node stats start

        pppoe_rx_packets               = 1790011         common
        pppoe_rx_bytes                 = 2270994311      common
        pppoe_tx_packets               = 900723          common
        pppoe_tx_bytes                 = 658991810       common
        pppoe_rx_dropped[0]            = 0               drop
        pppoe_rx_dropped[1]            = 0               drop
        pppoe_rx_dropped[2]            = 0               drop
        pppoe_rx_dropped[3]            = 0               drop
        pppoe_short_pppoe_hdr_length   = 0               exception
        pppoe_short_packet_length      = 0               exception
        pppoe_wrong_version_or_type    = 0               exception
        pppoe_wrong_code               = 0               exception
        pppoe_unsupported_ppp_protocol = 0               exception
        pppoe_disabled_bridge_packet   = 0               exception

Acceleration works both on IP and on PPPoE, both TX and RX.

hgblob · August 15, 2021, 11:50am

Dunno if PPoE impacts it or not, I just have a normal ethernet connection upstream. I get a ECM error in kernel logs and only rx shows as accelerated.

dchard · August 15, 2021, 11:55am

I only had this issue when I forgot to enable the ECM client in the config. I believe we tested this in pure IP as well as PPPoE, and in both cases it did worked.

hgblob · August 15, 2021, 12:01pm

For me the pppoe client was always enabled, it might be some configuration issue, but this is the only way to get wired ethernet to work at all.