my concern is that in a system where all interfaces are supposed to be separate with their own ip subnets, it causes problems when different interfaces have identical mac addresses. How can the routing software keep them apart? But I may be wrong as internally perhaps the interfaces are distinguished by other means than their mac address.
Confusion with other devices is unlikely as I just assigned manually different values to the last byte, like is the case with the radio interfaces.
The switch will handle the forwarding, MAC-s can be the same and have been for almost all of the multi port devices that have an actual switch in OpenWrt.
About the leak:
- a few(~2) or no devices connected to the ath11k radio makes it go crazy with memory usage
- a lot of devices(~20) plus the 512MB profile patch keeps it stable at 60%
It's not necessarily a leak because if I keep a couple device connected for an hour the memory usage goes up to 80% but if I connect more devices then the memory usage goes back down to 60%.
I have some weird build(many patches swiped from the chinese fork) that has been rock solid with around 20 devices connected to ath11k but @dchard with 2 devices connected gets the OOM after max 2 days.
I guess there is no memory leak per-se but the driver seems to be allocating too much memory that only gets released when there are many devices connected.
just tried it here, three radios as AP and all three 'option hidden 1'.
firmware: today's 'restart' with all 'lede' originating patches added to package/kernel/mac80211/patches/ath11k and associated changes that the 512M profile patch requires.
No load from traffic, other than a ssh link via lan3 to see what is going on.
When not 'hidden' the MemFree slowly reduces to 247MB coming from about 280M and then stabilizes.
When 'hidden' this reduction does not take place and MemFree fluctuates close to 279 MB.
The image i used yesterday had a few debugging things turned on which themselves were eating RAM. I built a clean image and booted into that yesterday.
Straight after boot
total used free shared buff/cache available
Mem: 375804 138876 199292 30608 37636 178232
Swap: 0 0 0
After 30 minutes
total used free shared buff/cache available
Mem: 375804 144348 193820 30608 37636 172760
Swap: 0 0 0
After 2 hours, free down by 22MB but that's accounted for by buff/cache increasing by the same.
total used free shared buff/cache available
Mem: 375804 144716 172024 30612 59064 161676
Swap: 0 0 0
Now up for 21 hours and still the same as before
total used free shared buff/cache available
Mem: 375804 144532 172252 30612 59020 161880
Swap: 0 0 0
This image is robiarko's AX3600-5.10-restart (forked last week) with my patches for including bridge-mgr (see Roaming Issues Xiaomi AX3600 - #81 by psi-c). No wifi clients, just two devices connected via ethernet.
Not sure if there was any doubt about it being an ath11 problem, but I activated the IOT antenna this afternoon, broadcasting SSID whilst the ath11 was still hidden (previously I had the IOT disabled). Free memory decreased by about 20MB but has stayed stable over the last 11 hours so doesn't seem to be a problem there.
I tried to put the 512M patch to the packages/kernel/mac80211/patches/ath11k folder then run make packages/kernel/mac80211/refresh V=s, but every time the patch gets deleted from the folder. What am I doing wrong?
I'm not very familar with the openwrt build system, but I would have thought the right command is make packages/kernel/mac80211/compile V=s
And in any case if everything else fails just clean and build again everything.
The patch is still not applying. At least it stays in the ath11k folder this time. There is no error, it just simply not applies.
the #998 512M patch in mac80211 is part of a set of patches that add and change .dts files (reserved memory settings) and ath.mk (introduction of a configuration flag). The set can be seen here: https://gitce.net/mirrors/lede/commit/6967bf73f076826e9a6a6891ff204e4f4fdd90cd
The mac80211 patch later got renumbered from 207 to 998 but won't work in splendid isolation.
blogic has commited a new branch to the tip repo including 11.4 nss binaries.
Readme.text has not been updated, but the bins reporting Version: NSS.HK.11.4.0.5-5-R
Unfortunately the QUIC repo still only contains the 11.3 binaries.
Maybe we can now add the 11.4 binaries to the nss-packages repo (due to the public availability of the bins)?
@robimarko I discovered the reason for my network packet corruption. Seems without the tx path acceleration my I cannot really access the internet over ethernet.
After a bit of experimentation I noticed that these two patches are enough to enable acceleration of the nss tx path and fix my corruption issue: https://github.com/hgblob/openwrt/commit/2455c085eee1ecec43481cff490f7ec98fad512a
And of course to enable CONFIG_NF_CONNTRACK_CHAIN_EVENTS=y in nss-ecm.
What is your opinion of these patches? I have no idea of their real source, I just got them from the chinese fork with a whole lot of other patches and then started bisecting.
Did you try to clean and start from scratch: make clean && make all ?
Hi, does all devices connected to gigabit ports have the shared throughput like in ax6000 or ax3600 switch is connected in different way and give us more throughput?
Can you elaborate a bit more? In Robimarko's branch both tx and rx path acceleration works on the wired interfaces with both IPoE and PPPoE modes. I see 0% load when I do speedtest in both the WAN or LAN domain.
Should have been more clear, I was talking about the ipv4 acceleration. In robi's branch in the nss stats only the rx packets are counted(and I get my corruption thing) with the above patches both paths are counted.
You can check in nss debug:
root@OpenWrt:~# cat /sys/kernel/debug/qca-nss-drv/stats/ipv4
________________________________________________________________________________
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< IPV4 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
________________________________________________________________________________
ipv4_rx_pkts = 6612485 common
ipv4_rx_byts = 7565932223 common
ipv4_tx_pkts = 413882 common
ipv4_tx_byts = 517088193 common
ipv4_rx_queue[0]_drops = 0 drop
ipv4_rx_queue[1]_drops = 0 drop
ipv4_rx_queue[2]_drops = 0 drop
ipv4_rx_queue[3]_drops = 0 drop
This is how it looks for me when both are what I assumed to be processed by the NSS.
This is Robi's branch, only the 512M patch is applied, nothing else:
root@XAX6:~# cat /sys/kernel/debug/qca-nss-drv/stats/ipv4
________________________________________________________________________________
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< IPV4 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
________________________________________________________________________________
ipv4_rx_pkts = 2718231 common
ipv4_rx_byts = 2919466237 common
ipv4_tx_pkts = 2425734 common
ipv4_tx_byts = 2664666702 common
ipv4_rx_queue[0]_drops = 0 drop
ipv4_rx_queue[1]_drops = 0 drop
ipv4_rx_queue[2]_drops = 0 drop
ipv4_rx_queue[3]_drops = 0 drop
root@XAX6:~# cat /sys/kernel/debug/qca-nss-drv/stats/pppoe
#pppoe base node stats start
pppoe_rx_packets = 1790011 common
pppoe_rx_bytes = 2270994311 common
pppoe_tx_packets = 900723 common
pppoe_tx_bytes = 658991810 common
pppoe_rx_dropped[0] = 0 drop
pppoe_rx_dropped[1] = 0 drop
pppoe_rx_dropped[2] = 0 drop
pppoe_rx_dropped[3] = 0 drop
pppoe_short_pppoe_hdr_length = 0 exception
pppoe_short_packet_length = 0 exception
pppoe_wrong_version_or_type = 0 exception
pppoe_wrong_code = 0 exception
pppoe_unsupported_ppp_protocol = 0 exception
pppoe_disabled_bridge_packet = 0 exception
Acceleration works both on IP and on PPPoE, both TX and RX.
Dunno if PPoE impacts it or not, I just have a normal ethernet connection upstream. I get a ECM error in kernel logs and only rx shows as accelerated.
I only had this issue when I forgot to enable the ECM client in the config. I believe we tested this in pure IP as well as PPPoE, and in both cases it did worked.
For me the pppoe client was always enabled, it might be some configuration issue, but this is the only way to get wired ethernet to work at all.