Adding OpenWrt support for Xiaomi AX3600 (Part 1)

Well, hostapd is supposed to bring the WLAN interface down before exiting

@robimarko think we can repo the issue by just init a sysupgrade while we are stressing the wifi for example with an iperf... IMHO there is a problem with terminating the packet processing and this cause longer delay... (since ath11k works entirely on async ring handling)

guess ath11k still have problem with interface removal and can suffer of some corner case where the delay is very long

btw interesting patch
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending&id=ff1c40050ac3ca6a55ccc7970e37503e75465977

1 Like

I build a firmware with the newest upstream nss_packages, I found this log about nss-dp

[ 83.543235] NAPI poll function edma_tx_napi+0x0/0xf0 [qca_nss_dp] returned 108, exceeding its budget of 64.

That makes sense why it could not get killed.

Yeah, that patch is interesting, been in pending for some time, I will pick it instead of waiting to get merged

we still need to polish that but main intention is to check if it does improve anything...

I randomly found someone with the same results I tested, packet_stearing being faster than gro in NAT (different soc):
https://github.com/openwrt/openwrt/blob/c6f16b63fa7f3a44cd4a71d7e195d6046d612075/target/linux/bcm53xx/patches-5.15/600-net-disable-GRO-by-default.patch
Quote:

In many cases GRO improves network performance however it comes at a cost of chacksums calculations. In case of slow CPU and missing hardware csum calculation support GRO can actually decrease network speed.

On BCM4708 disabling GRO results in following NAT masquarade speed changes:

  1. 364 Mb/s → 396 Mb/s (packet steering disabled)
  2. 341 Mb/s → 566 Mb/s (packet steering enabled)

Yeah, cause GRO will bunch up smaller packets to get processed as one big packet, but that does invalidate the checksum and it needs to be recalculated.
This ethernet adapter for sure has RX checksum offloading, but there is no clues how to use it but the CPU is way faster than the BCM4708 which is just old dual core A9.

In our case even without checksum offloading its faster

What I saw in my previous test is:
In test#2, gro on, ~912/740Mbits.
In test#1 or test#3, gro off, >930/930Mbits.

We have 4 cores, but most of the times it only uses one at 100%.
Even when traffic is from/to diff interfaces, eth0 from/to eth1,2 or 3.
Sorry, I'm not trying to contradict in something I barely understand, but I can't dismiss the test result.

1 Like

Hm, this is WAN to LAN or?
Core0 will suffer as by defaults its gonna get stuck with handling IRQ-s and NAPI

@robimarko
It's wan<->lan full-duplex and yes it's ksoftirq that gets 100% one core.

With NAT SW offload off, in test #4 and #5, with or without gro, looks like packet steering randomly kicks in and (in some test runs) we can see ksoftirq running in 3 cores and good speed, but it's at random test runs.

If you find useful, when you have some time, take a look at the tests above: https://forum.openwrt.org/t/adding-openwrt-support-for-xiaomi-ax3600/55049/7824

Packet steering is a hit-and-miss currently.
But yeah, GRO could be causing performance drops in some cases as checksum calculation is not offloaded

1 Like

I forget if it was pppoe or plain nat. but the speed is inconsistent. Don't know when I'll have the time to test again, but guess after (if) you add more features to the driver :slight_smile:

@Ansuel Reworked the driver to be more efficient and added threaded NAPI, so it should be faster as of couple of hours ago.

2 Likes

but yhea budget control is still rip so i think there is still something to improve...

Well, the whole driver is shit without offloading basic stuff.
And bits in IPQ40xx EDMA and EDMA v2 dont trigger the same stuff on EDMA v1.
I tried enabling TX checksum calculation but HTTP traffic for example had broken checksums

wait so some traffic had checksum calculated?

Yeah, I was using wireshark with enabled checksum validation and ICMP packets and such had valid checksum.

But, I was not monitoring for a simple HTTP server wget action, that was broken for sure.
Might try enabling it again and sniffing that traffic

well then we are missing some bits... cause edma v2 have checksum for different types of traffic so we are probably not enabling some of them...

Most likely, I used the same bits as EDMA v2 but with the pre-header most stuff dont make sense.

it's probably mixed... some are placed in ctrl and some in the preheader

IMHO to check this we should check the traffic done by nss and dump the descriptor set by nss and dump the registers... Wonder if we find something interesting...