Adding OpenWrt support for Xiaomi AX3600 (Part 1)

I did I few WAN-LAN full duplex tests.
In my test scenario (FULL DUPLEX), I get random results, still I can't see any big measurable differences from those IRQ changes.


WAN-WiredLAN Full duplex test:
iperf3 -c 192.168.0.xxx --bidir -t 30
Your latest build r0-060646b.

-> Test #1:
GRO ON, Packet Steering ON, Software flow offloading ON, DCHARD-IRQ OFF, SNOW-IRQ OFF, WIFI OFF.

RUN#1:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-30.00  sec  3.20 GBytes   916 Mbits/sec    0             sender
[  5][TX-C]   0.00-30.00  sec  3.20 GBytes   915 Mbits/sec                  receiver
[  7][RX-C]   0.00-30.00  sec   603 MBytes   169 Mbits/sec  311             sender
[  7][RX-C]   0.00-30.00  sec   602 MBytes   168 Mbits/sec                  receiver

image

-> RUN#2:

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-30.00  sec  3.16 GBytes   905 Mbits/sec  335             sender
[  5][TX-C]   0.00-30.00  sec  3.16 GBytes   904 Mbits/sec                  receiver
[  7][RX-C]   0.00-30.00  sec  1.96 GBytes   562 Mbits/sec  327             sender
[  7][RX-C]   0.00-30.00  sec  1.96 GBytes   561 Mbits/sec                  receiver

image

I've see this before, results are inconsistent between different runs for the same test conditions. It depends how the load is distributed.
I've posted the worst and the best test runs possible out of more than 10 runs.


-> Test #2:
GRO ON, Packet Steering ON, Software flow offloading ON, DCHARD-IRQ OFF, SNOW-IRQ ON, WIFI OFF.

Same results as Test#1, so I didn't copy past them here.
I can't see difference with SNOW-IRQ ON in this test scenario.


-> Test #3:
GRO ON, Packet Steering ON, Software flow offloading OFF, DCHARD-IRQ OFF, SNOW-IRQ OFF, WIFI OFF.
Full duplex results in different runs:
~400/600
~900/600
~900/600
~300/600
~800/300

-> Test #4:
GRO ON, Packet Steering ON, Software flow offloading OFF, DCHARD-IRQ OFF, SNOW-IRQ ON, WIFI OFF.
Full duplex results in different runs:
~900/750
~920/670
~920/220
~920/670

Too random results to make a conclusion.


Note: I can always get 930/930 max full-duplex wan-lan speed with these settings:
GRO OFF, Packet Steering ON, Software flow offloading ON.

2 Likes

@robimarko

WAN-LAN Full Duplex, in a test scenario with Packet Steering OFF, Software flow offloading OFF, the SNOW-IRQs help.
SNOW-IRQs OFF, I get around 900/130 Mbits in all 5 runs.
SNOW-IRQs ON, I get around 900/600 Mbits, but a few times I get much lower results like ~830/250 Mbits.


GRO ON, Packet Steering OFF, Software flow offloading OFF, DCHARD-IRQ OFF, WIFI OFF.

Test#1 - SNOW-IRQ OFF:
image

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-30.00  sec  3.13 GBytes   897 Mbits/sec    0             sender
[  5][TX-C]   0.00-30.01  sec  3.13 GBytes   896 Mbits/sec                  receiver
[  7][RX-C]   0.00-30.00  sec   474 MBytes   133 Mbits/sec  499             sender
[  7][RX-C]   0.00-30.01  sec   474 MBytes   132 Mbits/sec                  receiver

I'm getting the same speed +-900/130 in different test runs.


Test#2 - SNOW-IRQ ON:
image

[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-30.00  sec  3.16 GBytes   904 Mbits/sec  334             sender
[  5][TX-C]   0.00-30.00  sec  3.16 GBytes   903 Mbits/sec                  receiver
[  7][RX-C]   0.00-30.00  sec  2.55 GBytes   730 Mbits/sec  276             sender
[  7][RX-C]   0.00-30.00  sec  2.55 GBytes   729 Mbits/sec                  receiver

RUN#2:

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-30.00  sec  3.13 GBytes   895 Mbits/sec  289             sender
[  5][TX-C]   0.00-30.00  sec  3.12 GBytes   894 Mbits/sec                  receiver
[  7][RX-C]   0.00-30.00  sec  2.16 GBytes   619 Mbits/sec  738             sender
[  7][RX-C]   0.00-30.00  sec  2.16 GBytes   618 Mbits/sec                  receiver

I get around 900/600 Mbits, but a few times I get much lower results like ~830/250 Mbits.

1 Like

I expect the results to be pretty much the same as its packet steering that brings the most improvement, as rxdesc IRQ will hammer whatever core it's set to, none of the other IRQ-s are anywhere near this one.

That is why we really need some more clues from QCA to get basic offloads and not just dump it all on the CPU.

6 Likes

Is there any plan for that?

What kind of plan are you expecting?

Only "documentation" is the driver itself and its lacking any of the HW details

Okay, I found this thread from last year[1] which says including bridge-mgr helped.
Why is this then still an issue now? Can someone enlighten me before I spend hours digging?

[1] Roaming Issues Xiaomi AX3600 - #84 by joba-1

By any chance, do you have "802.11r Fast Transition" enable?

I have an issue with two AX3600 configured as dumb APs.
When "802.11r Fast Transition" is enable and configured.
Randomly my xiaomi 9T pro phone, can't use internet. I need to turn wifi off/on in phone to work again.
Sometimes FT works for a few days, until one day I won't have internet (in my phone).
Don't know if it's my phone or else.
With FT disable, everything works 100% of time here.

That roaming issue has been long gone

2 Likes

Nice, Greg backported one more clock fix in 5.15.63.
I have the 5.15 tree redone, need to clean it up and export to replace the current one and see if there are regressions.

6 Likes

@robimarko
I think there's something wrong/missing, in the initramfs image from your repo and also the ones we pull and build locally.

Log after bootm the ax3600 initramfs from your repo:

[    6.172657] ath11k c000000.wifi: ipq8074 hw2.0
[    6.172956] remoteproc remoteproc0: powering up cd00000.q6v5_wcss
[    6.176067] remoteproc remoteproc0: Booting fw image IPQ8074/q6_fw.mdt, size 668
[    6.539508] remoteproc remoteproc0: remote processor cd00000.q6v5_wcss is now up
[    6.541764] ath11k c000000.wifi: qmi ignore invalid mem req type 3
[    6.546558] ath11k c000000.wifi: chip_id 0x0 chip_family 0x0 board_id 0xff soc_id 0xffffffff
[    6.552040] ath11k c000000.wifi: fw_version 0x250a04a5 fw_build_timestamp 2021-12-20 07:09 fw_build_id WLAN.HK.2.5.0.1-01208-QCAHKSWPL_SILICONZ-1
[    6.561820] kmodloader: done loading kernel modules from /etc/modules.d/*
[    6.674199] ath11k c000000.wifi: failed to fetch board data for bus=ahb,qmi-chip-id=0,qmi-board-id=255,variant=Xiaomi-AX3600 from ath11k/IPQ8074/hw2.0/board-2.bin
[    6.674268] ath11k c000000.wifi: failed to fetch board data for bus=ahb,qmi-chip-id=0,qmi-board-id=255 from ath11k/IPQ8074/hw2.0/board-2.bin
[    6.687741] ath11k c000000.wifi: failed to fetch board.bin from IPQ8074/hw2.0
[    6.700404] ath11k c000000.wifi: qmi failed to fetch board file: -12
[    6.707416] ath11k c000000.wifi: failed to load board data file: -12

Nothing new, when building for multiple devices buildroot does not account for DEVICE_PACKAGES but rather ships only the package set for the whole target

2 Likes

@robimarko well I can tell you that on the 3 installs I have done since July that issue on ethernet exists; as indicated in my initial post, the DCHP reply leaves the switch port to the AX3600 but there it never shows up on eth0 or any other interface until up-to- about 5 minutes have passed and then suddenly it starts working and tcpdump on the AX3600 shows the reply and (roamed) wireless client gets an IPv4 address. IPv6 MC works without trouble though.

@robimarko ok and for ax6 assuming 128mib is 0x8000000
0x8000000 - 0x2de0000 = 0x5220000

Is it correct?

Currently we have

&rootfs {
	reg = <0x2de0000 0x38220000>;
};

that seems very wrong.

Your calculation seems correct to me

Can I get more details to somehow reproduce it?

Part of the test setup (ignoring tagged VLANs coming in on eth0):

  • Device mobile A doing WiFi
  • 1st AX3600-01 (this could also be another AP, tested with an older Airport as well, just something to roam from so that the MAC address appears elsewhere on the same VLAN before)
  • 2nd AX3600-02
  • switch infrastructure, DHCP server, IPv6 router, ..

mobile A is assoc to AX3600-1 all good. Packets flow.
mobile A moves and roams to AX3600-2.
mobile A sends DHCP request, DHCP request hits DHCP server, DHCP replies with OFFER (confirmed by log and tcpdump), switch sends out packet to AX3600-2 (confirmed by span port and in one case becauue I am crazy an old 10Mbit/s HUB). tcpdump on eth0 on AX3600-2 does not show the DCHP OFFER packet coming in.
mobile A also sends IPv6 RS, IPv6 router answers, IPv6 packets are seen on eth0 on AX3600-2, mobile A gets IPv6 Prefix, default router, IPv6 DNS ...

mobile A repeats DHCP requests regularly with backoff time; after <n..300> seconds DHCP OFFER is visible on AX3600-2 in tcpdump and goes out by WiFi to mobile A which gets IPv4 address, default router, and DNS information.

Alternative:
after roaming, mobile A associates to AX3600-2. Rather than twiddling, I run ssdk_sh fdb entry flush 1 and mobile A gets IPv4 address with the next DHCP Request immediately.

The oldest snapshot I tried was Linux ax3600-2 5.15.50 #0 SMP Fri Jul 8 10:46:58 2022 aarch64 GNU/Linux, the newest is Linux ax3600-3 5.15.61 #0 SMP Sat Aug 20 14:49:55 2022 aarch64 GNU/Linux.

I've since found your nss-package repo and your solution in qca/qca-nss-dp/patches/0010-switchdev-fix-FDB-roaming.patch. Now I assume I'll need to learn more Linux again after 20 years of FreeBSD and see how to debug that some more. I seem netdev_dbg() calls in there and it seems I should be able to trigger them (assuming they are not compiled out)?

Thanks for the details, I might have time to debug over the weekend.

netdev_dbg() is compiled out if you dont set #define DEBUG at the top of source and then set the console level to debug.
Or simply change the netdev_dbg() to netdev_info() and then it will get printed.

1 Like

@Ansuel I use this partition scheme, I think the whole flash is used.

Sun Aug 21 01:13:44 2022 kern.notice kernel: [    0.000000] Kernel command line: console=ttyMSM0,115200n8 ubi.mtd=rootfs root=mtd:ubi_rootfs rootfstype=squashfs rootwait root=/dev/ubiblock0_1
....

Sun Aug 21 01:13:44 2022 kern.info kernel: [    1.452946] nand: device found, Manufacturer ID: 0xc8, Chip ID: 0xaa
Sun Aug 21 01:13:44 2022 kern.info kernel: [    1.468229] nand: ESMT GD9FS2G8F2A
Sun Aug 21 01:13:44 2022 kern.info kernel: [    1.474653] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.478169] 14 qcomsmem partitions found on MTD device qcom_nand.0
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.485432] Creating 14 MTD partitions on "qcom_nand.0":
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.491677] 0x000000000000-0x000000100000 : "0:sbl1"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.498418] 0x000000100000-0x000000200000 : "0:mibib"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.503290] 0x000000200000-0x000000500000 : "0:qsee"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.509781] 0x000000500000-0x000000580000 : "0:devcfg"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.512878] 0x000000580000-0x000000600000 : "0:rpm"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.517831] 0x000000600000-0x000000680000 : "0:cdt"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.522614] 0x000000680000-0x000000700000 : "0:appsblenv"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.527450] 0x000000700000-0x000000800000 : "0:appsbl"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.533424] 0x000000800000-0x000000880000 : "0:art"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.538035] 0x000000880000-0x000000900000 : "bdata"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.542851] 0x000000900000-0x000000980000 : "crash"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.547680] 0x000000980000-0x000000a00000 : "crash_syslog"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.552560] 0x000000a00000-0x00000fa00000 : "rootfs"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.740131] mtd: device 12 (rootfs) set to be root filesystem
Sun Aug 21 01:13:44 2022 kern.alert kernel: [    1.740425] mtdsplit: no squashfs found in "rootfs"
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.744894] 0x00000fa00000-0x00000fa80000 : "rsvd0"
Sun Aug 21 01:13:44 2022 kern.info kernel: [    1.755330] cpufreq: cpufreq_online: CPU0: Running at unlisted initial frequency: 800000 KHz, changing to: 1017600 KHz
Sun Aug 21 01:13:44 2022 kern.info kernel: [    1.756516] remoteproc remoteproc0: cd00000.q6v5_wcss is available
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    1.765835] ubi0: attaching mtd12
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.339715] random: crng init done
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.833258] ubi0: scanning is finished
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.840419] ubi0: attached mtd12 (name "rootfs", size 240 MiB)
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.840468] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.845159] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.852036] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.858874] ubi0: good PEBs: 1920, bad PEBs: 0, corrupted PEBs: 0
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.865653] ubi0: user volume: 3, internal volumes: 1, max. volumes count: 128
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.871902] ubi0: max/mean erase counter: 2/0, WL threshold: 4096, image sequence number: 1661006995
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.879018] ubi0: available PEBs: 0, total reserved PEBs: 1920, PEBs reserved for bad PEB handling: 40
Sun Aug 21 01:13:44 2022 kern.notice kernel: [    2.888321] ubi0: background thread "ubi_bgt0d" started, PID 435
Sun Aug 21 01:13:44 2022 kern.info kernel: [    2.889074] block ubiblock0_1: created from ubi0:1(rootfs)
Sun Aug 21 01:13:44 2022 kern.info kernel: [    2.907376] VFS: Mounted root (squashfs filesystem) readonly on device 254:0.
1 Like

Also wouldn't be a good idea not to use the whole NAND space?
The main reason for that is that NAND memory cells are very likely to corrupt, then NAND controllers automatically don't write data on corrupted cells. If you leave some 'unpartitioned' space left - it means that the controller has more blocks to remap corrupted blocks and increase flash life. Leaving 10% space unpartitioned of total flash will increase the life of it.

We are talking about ONFI 1 parallel NAND here, it is raw NAND, and there is no built-in controller.
SoC is the controller here, so leaving unused NAND won't bring anything positive.

UBI will handle moving data from worn-out cells to good ones, to do that you need to give it as much space as possible.

What you are talking about is used if NAND is used with a built-in controller, something like SSD where the FW will do the same marking of bad blocks and moving data around

2 Likes