Yes, I have seen it. As Roib pointed it out earlier, the QDSS trace is likely not even working with the upstream ath11k. Non the less, I gave QCA everything they asked for, with the debug log masks they specified. If they want me to dump QDSS, it is up to them to ask.
ath11k and nss firmware have so much confidential and industry secrets that you won't ever have access to them unless you pay a ton of money and you sign multiple NDA that if not followed will end up you in court and debt for life.
don't underestimate the complexity of such things... wifi ax is magic at this point considering it does implement LTE algo on wifi... wifi 7 will be even more complex and i think it will run an entire different linux system under the hood with the mac80211 driver used just a as coordinator... (this was already the case with ath10k and ath11k is already something similar with some rtos system)
and considering all the regulation and law about fequency emission, there another reason why these blob won't ever have a source.
the only way to fix this is work for qcom and coincidentally also have someone that care to the opensource community just like every mtk driver...
Problem is not even the blobs. Let them have their blobs if they're properly supported on mainline and not just some ancient kernel nobody wants to use but that's not happening anytime soon apparently.
consider that they all work mainly from OEM request... if no OEM complain every bug is put on very low priority... and as robi say with the soon -> not very soon. low -> very very low. LOL
Hi, i know all those openwrt firmware are experimental, but i would point out that somehow even if our ax3600 is a 512 mb with plenty of ram, i can barely use adguard with more than 4 tiny lists.
I use the firmware with partition size of 172mb.
On my previous linksys 7500v2 with openwrt and 256 mb ram and a smaller flash device i have 21 blocklists loaded with no issues.
Now even with adguard service stopped at all, my free ram is only 112..
if we are talking with Qualcomm security is a good point to raise as QCA is trailing very badly ... what are OEMs providing to people/consumers like us? on the top of agendas of any governments, companies(OEMs) ... us consumers security is a top priority.
I also don't quite understand why QCA is holding on to their old kernel and just keeping it as it is. Of course it's gonna be a pain in the a** to properly rewrite and upstream their drivers and they probably need to hire a few more Indians for that but from there on won't it also be much easier for them to maintain everything?
precisely tech debt just smells so much on this QCA software! apart from what has been offered by OEMs to consumers with software that listens on people's traffic and sends it to HQ! ... and this is an US company by the way
I guess what I'm trying to say here is that latency will always go up anytime you saturate a link. SQM works by limiting traffic below the saturation point of the link, which is why latency never rises.
Traffic shapers/schedulers like SQM were designed with high latency, low bandwidth links in mind like ADSL. On high bandwidth links these ought not to be necessary unless you're actually regularly saturating the link. The tests you are showing, by design, saturate the link. The result is somewhat meaningless on a high bandwidth link. In practice, you will never be saturating such a link for any significant periods of time where it would matter, unless you're doing something pretty unusual on a home network.
Traffic shapers like SQM actually work by ADDING latency to the link. My suggestion would be to test through the router from an internal endpoint to a reliable destination - say Google - with a tool like MTR and then do whatever it is you'll normally be doing and see if latency is significantly effected. I bet that its not.
The screenshot you're showing is very likely to be downstream collapse due to saturating the 50 Mbps upstream. There are very few home uses that are going to saturate a 50Mbps uplink but two that I can think of are large file transfers and torrents. IMHO the better solution there is to limit at the client (exactly why torrent software has this feature), but if you really want to do it at the router I would make a tc filter specifically for that traffic and only that traffic and then assign a bulk priority to the transfer so that other interactive traffic can steal that bandwidth when its needed without slowdown. SQM does this automagically but its a pretty heavy way to accomplish that goal. You will be inherently limited to what ever the single stream session capacity is of a single CPU core. Most CPUs its somewhere between 250Mbps - 500Mbps.
There is something weird with how the QCA drivers are allocating memory IMHO. I have my own build from the robimarko tree with zram enabled and it has helped tremendously. I have no idea why the qualcomm drivers allocate so much memory and I doubt anyone else can tell us except qualcomm.
Just reporting: the issue where ath11k WDS AP supports just one simultaneous client is still present on today's #1d82c6b.
This is with ath11k frame decap offload disabled. The last working build I have for WDS AP was on 2022_10_02. My next build was on 2022_10_24, and was the first time I saw this problem. Reverting from #1d82c6b back to 2022_10_02 restores multi-client functionality.
Today's #1d82c6b is working great on the two WDS clients that connect to that AP.
It's clear that is a bug. When the device uses more memory if "idle" than with "load", is clear that something is not right.
This image shows clearly when I connect my IP camera to the router. Without the camera, it uses 75% of available memory, after connecting it, that sends video continously, the memory used is reduced to about 60%.
Maybe we need to open another bug in the repo of ath11k to see if we can get something, but I'm not able to fill more details than that if asked.
I think we're all on the same page here, but as far as I know, Ansuel, Robimarko, etc... is not QCA. I don't know what we can actually do about it without code for the driver.
If someone is connected to QCA maybe they can open bugs but like I said earlier... I don't expect they will give open source priority. It is what it is. In Linux world... we have always been "stuck" with what the commercial vendor gives us. I think Robimarko, Ansuel, others.. have done a very good job for Linux support and they deserve our appreciation for that.
I have been running, more or less, Robimarko build (I only add stuff to mine like Strongswan) and its working very well. I've had several days uptime with no problems. The tag for 11/16(I think - not looking it up... pretty close) my build with strongswan, dawn, etc seems pretty rock solid stable. I think its the best we can hope for. I found a workaround(probably fix, I'll say more lower) for strongswan and I hope this time I will leave it running more than a couple weeks and lets see what its got but I suspect its going to run well.
As far as my kernel oops with strongswan is concerned... I found if you remove from /etc/modprobe.d/30-ipsec the module af_key strongswan does not crash and apparently its not needed at all on current versions, but its auto-loaded from OpenWRT config. There is more, here:
Now, I'm on the fence on what to do about it. Strongswan dev says it might actually be an endian bug, but the module isn't required at all on 5.15.xx... I'm not clear if current version OpenWRT (22.03 right?) packages 5.15 kernel or if I should submit bug there to remove openwrt requirement in the module load file. Apparently, its not required on current versions, but I'm not sure if OpenWRT mainline uses 5.15 (for some reason, I'm thinking 5.10) so there is maybe a conundrum here.
The workaround is to comment out af_key from /etc/modprobe.d/30-ipsec the line for af_key and things run fine. So, now I hope to get more than 1-2 weeks uptime without strongswan crashing the kernel.. I strongly suspect its going to work. We probably still have memory leaks in the binary but if we have nobody from QCA to support us, what can we do about it?
Robimarko, what kind of buy-in do you need to get this committed to OpenWRT mainline? I feel like if I can test this next round 30 days without crash, we're practically there. I know there are problems with things like WDS... I don't use it... can't address it. But, my feeling is, this things is ready for a release.. at least beta.
I decided I will give another insight as to how software dev works.. you're probably right there is something definitely wrong with memory allocation... My educated guess is that the driver overcommits and when presented with real data it releases overcommitted memory... however, this is neither here nor there as if we do not have the code, we will never know what the driver is actually trying to do or why without help from QCA or source code. You understand why this is a problem for the devs here...? Its probably not right but we need help from the vendor to fix it... you get it?
Guys, please stop talking nonsense. The driver code (ath11k) is fully open source, part of the kernel. Anyone is welcome to dig in. The firmware is a binary blob with no way of debugging, that is QCA territory. Of course there is quite some cross-dependence between the two, but if you think it is the driver then go ahead and dig in: