Without the smp_affinity_settings the transfer was loaded only on core0 and that more or less maxed out.
I was just doing some tests as some users asked.
Without the smp_affinity_settings the transfer was loaded only on core0 and that more or less maxed out.
I was just doing some tests as some users asked.
Sure, I was reffering to the case where no core was maxed out
You are right but as a quick test I usually do it this way.
BTW the exact same wireless transfer maxes out gigabit on stock firmware.
Stock interrupts just for the interested
root@XiaoQiang:~# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
3: 566786 98541 90428 3341065 GIC 20 Edge arch_timer
40: 1 0 0 0 msmgpio 34 Edge soc:gpio_keys
76: 0 0 0 0 GIC 270 Level bam_dma
77: 396 0 0 0 GIC 340 Level msm_serial0
79: 2 0 0 0 GIC 127 Level 78b5000.spi
80: 0 0 0 0 GIC 357 Edge q6_wdog_interrupt
81: 0 0 0 0 GIC 344 Edge 7803000.sdcc1ice
82: 3 0 0 0 GIC 354 Edge smp2p
83: 0 0 0 0 GIC 276 Edge tzerror
88: 134610 0 0 0 GIC 409 Edge nss_empty_buf_sos
89: 49247 0 0 0 GIC 410 Edge nss_empty_buf_queue
90: 0 0 0 0 GIC 411 Edge nss-tx-unblock
91: 7046628 0 0 0 GIC 412 Edge nss_queue0
92: 0 4603 0 0 GIC 413 Edge nss_queue1
93: 0 0 4503 0 GIC 414 Edge nss_queue2
94: 0 0 0 61867 GIC 415 Edge nss_queue3
95: 0 0 0 0 GIC 416 Edge nss_coredump_complete
96: 0 0 0 0 GIC 417 Edge nss_paged_empty_buf_sos
97: 62821 0 0 0 GIC 422 Edge nss_empty_buf_sos
98: 0 0 0 0 GIC 423 Edge nss_empty_buf_queue
99: 0 0 0 0 GIC 424 Edge nss-tx-unblock
100: 6544252 0 0 0 GIC 425 Edge nss_queue0
101: 0 0 0 0 GIC 426 Edge nss_queue1
102: 0 0 0 0 GIC 427 Edge nss_queue2
103: 0 0 0 0 GIC 428 Edge nss_queue3
104: 0 0 0 0 GIC 429 Edge nss_coredump_complete
105: 0 0 0 0 GIC 430 Edge nss_paged_empty_buf_sos
106: 0 0 0 0 GIC 35 Level watchdog bark
107: 88 0 0 0 GIC 353 Edge qcom,glink-smem-native-xprt-modem
108: 0 0 0 0 GIC 239 Level bam_dma
109: 100429 0 0 0 GIC 178 Level bam_dma
111: 0 0 0 0 GIC 84 Edge qcom-pcie-msi
132: 4 0 0 0 GIC 348 Edge ce0
133: 622897 0 0 0 GIC 347 Edge ce1
134: 135587 0 0 0 GIC 346 Edge ce2
135: 7790 0 0 0 GIC 343 Edge ce3
137: 60 0 0 0 GIC 443 Edge ce5
139: 8323 0 0 0 GIC 72 Edge ce7
140: 20467709 0 0 0 GIC 71 Edge ce8
141: 0 0 0 0 GIC 334 Edge ce9
142: 0 0 0 0 GIC 333 Edge ce10
143: 0 0 0 0 GIC 69 Edge ce11
147: 8 0 0 0 GIC 326 Edge host2rxdma-monitor-ring3
148: 0 0 0 0 GIC 325 Edge host2rxdma-monitor-ring2
149: 8 0 0 0 GIC 324 Edge host2rxdma-monitor-ring1
150: 0 0 0 0 GIC 323 Edge reo2ost-exception
151: 0 0 0 0 GIC 322 Edge wbm2host-rx-release
152: 160 0 0 0 GIC 321 Edge reo2host-status
153: 0 0 0 0 GIC 320 Edge reo2host-destination-ring4
154: 0 0 0 0 GIC 271 Edge reo2host-destination-ring3
160: 1 87577 0 0 GIC 263 Edge ppdu-end-interrupts-mac3
161: 0 0 0 0 GIC 262 Edge ppdu-end-interrupts-mac2
162: 1 0 0 73383 GIC 261 Edge ppdu-end-interrupts-mac1
163: 3 0 0 0 GIC 260 Edge rxdma2host-monitor-status-ring-mac3
164: 0 0 0 0 GIC 256 Edge rxdma2host-monitor-status-ring-mac2
165: 3 0 0 0 GIC 255 Edge rxdma2host-monitor-status-ring-mac1
167: 0 0 0 0 GIC 215 Edge host2rxdma-host-buf-ring-mac2
169: 0 0 0 0 GIC 211 Edge rxdma2host-destination-ring-mac3
170: 0 0 0 0 GIC 210 Edge rxdma2host-destination-ring-mac2
171: 0 0 0 0 GIC 209 Edge rxdma2host-destination-ring-mac1
176: 0 0 0 0 GIC 191 Edge wbm2host-tx-completions-ring3
182: 0 0 0 0 GIC 216 Edge tsens_interrupt
183: 4 0 0 0 GIC 47 Edge cpr3
185: 995 0 37160 0 GIC 107 Level wlan_pci
186: 3 0 0 0 pmic_arb 3211277 Edge spmi-vadc
187: 0 0 1 0 smp2p 1 Edge error_ready_interrupt
188: 0 0 0 0 smp2p 0 Edge err_fatal_interrupt
189: 0 0 0 0 smp2p 3 Edge stop_ack_interrupt
190: 0 0 0 0 GIC 172 Edge xhci-hcd:usb1
IPI0: 23349 61300 76753 59157 Rescheduling interrupts
IPI1: 6 11 11 9 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 Timer broadcast interrupts
IPI4: 62 0 0 2 IRQ work interrupts
IPI5: 0 0 0 0 CPU wakeup interrupts
Err: 0
Ahh... We need to wait a bit more then.
With the affinity patches, the loads are spread more evenly. Hopefully I can get my AX capable card within a week and we can do more proper tests to see where the limits are.
On the RX decap offload: are we sure that it actually works? mac80211 hooks are present? (refering to the discussion from last night)
To actually see whether router running OpenWRT is CPU-limited we would need someone to run iperf3 between wired computer and WiFi client using a full fledged (at least) 2x2 802.11ax card. If you can push 1Gbit (which is limited by gbit ethernet) without maxing CPU then you are fine.
I finally received a QCA6391 card, so I can test the PCI outside of AX9000 as well.
Thanks to @sumo for the donation.
I think that everything needed for RX decap is here as the mac80211 support for it was merged in early 2021, so it was in 5.12 as well.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/net/mac80211?h=v5.15.2&id=80a915ec4427f0083829f7e6518ee9f21521ee1e
Just noticed that these two lines get inserted into /etc/sysctl.d/qca-nss-ecm.conf on every reboot even if they already exist?
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-iptables=1
Had maybe 10 duplicate entries.
I can confirm that. I have way more than 10, and based on the last write timestamp, it happens at every sysupgrade or reboot. (likely the first one).
MOD: no, every reboot the extra two lines are added...
Very welcome. Keep up the good work and let me know if I can be of any further help. Thanks!
A couple hours ago more patches arrived at Kalle's repo, and look what I have found:
ath11k: Fix ETSI regd with weather radar overlap
Looks exactly like hour issue. Other patches are also added which might worth take a look.
My little bird already pointed me to that one a week ago, I applied it but I honestly didn't see a difference in the reported rules at all.
Feel free to give it a go, I might have messed something up.
I repeated same upload tests with affinity settings and speed is a lot higher and CPU now using all cores.
I could try to test "RX decap" patch, but what exactly it should improve? Lower CPU usage in LAN -> WiFi case?
My joy was a bit too quick about that patch... It only fixes a driver warning, and not the wrong logic itself, which is likely not even on the driver's side. Non the less this line gives some info away:
If the firmware (or the BDF) is shipped with these rules
The task would be to understand the reg.c logic a bit better to make sure it cannot be fixed there. For example can someone explain to me that if a channel overlaps with the radar range, why is it changing its bandwidth instead of just applying the DFS an 600 sec CAC rule to the whole overlap band? Most ETSI countries are allowed to use 5470 - 5725 @ 160)
, yet ath11k splits it to two parts and creates a separate band in the middle where the radar range is. My point is, this might be reg.c, but my C skills are not that good to see it in the code. Someone with a bit more experience should take a look...
MOD:
Ok, I think I found where the extra split rules are created in reg.c:
/* Add max additional rules to accommodate weather radar band */
if (reg_info->dfs_region == ATH11K_DFS_REG_ETSI)
num_rules += 2;
and there are some extra elements in ath11k_reg_update_weather_radar_band
to create the new rule boundaries.
Question is: why? I am not an expert on regulatory, but to my knowledge if a band or channel overlaps the ETSI radar band, the only limitation would be to do DFS with a CAC timeout of 600 seconds. As to my knowledge the ETSI radar band has not further power limitation other than what the countries limits for the designated band. So there is really no reason why reg.c splits the 5470 - 5725 @ 160
range to "lower", "radar" and "upper" chanels... It should just apply 600 seconds CAC to the whole band and that should be it.
Reduce CPU usage on WLAN -> router
btw, @robimarko, do you build images on /releases
page with defconfig, or do you added some additional options to .config
?
You can see whats inside here:
lol. I intentionally looked in .github
and failed to see anything like workflow yaml there.
(Although, looks like I looked it in master
instead of -backports
)
Thanks!
I tested "ath11k: backport RX decap offload" but can't measure any difference in CPU usage.
In all firmware versions tested "restart", "backports" and "castiel652:ath11k-decap" with frame_mode=2 and without I am getting about 46% sirq usage at 790Mbits/sec iperf3 upload from phone to LAN server.
And still can't understand how in "backports" branch this speed drops to 0 at about 3 meters from router.
One of the major changes in the backport branch was the mac80211 update to 5.15 which could have something to do with it.
btw, @robimarko, how do you think, maybe it is worth to rework uboot config a bit and merge both "rootfs" partitions together before merging IPQ...-backports
back in OpenWRT repo?
Anyway I didn't find any way to make it to boot from another partition without access to the booted system or manual intervention through UART.
So this "duplcation" seems pretty useless, if swithing boot partition is only possible through UART and booted system, but it "eats" 35M
, which can be used to increase the space available for flashing custom rootfs, and leave remaining for overlay.
// btw, as far as I calculated, for now overlay
partition have size of ~30M
(0x01ec0000
== 32243712
, dividing it to 1024^2
gives 30.75
), but df -h
on image from /releases
says it is only 15.1M
O_o
UPD: Btw, I've also summed all the sizes from /proc/mtd
and result is 112.75M
, while spec AX3600 page on Wiki says there should be 256M
.
So, it seems too big space is disappeared to blame it on inaccuracies of all the kind
(and even if there is a typo and real flash size is 128M
, then 15M (128-113)
is still too big amount (~10%) to blame it on inaccuracies