Bringing this back, maybe I've stumbled across a clue.
Was reading another thread about running out of sirq. They mentioned that they were having difficulty getting higher than 300-400mbit with the 5ghz ac radio, apparently due to the cpu saturated with 99% sirq. See : this thread
I thought I had done better before, but really, my testing has always been limited by using my cable connection (300/30mbit, sometimes does 350 ingress) and not setting up something custom.
Curious I checked again, watching in top for sirq and idle %. I saw a peak of 250-260mbit with the 5ghz radio, but only 70-80% sirq, and 20-25% idle left. I remembered peaks of 270-280mbit or more. THEN, I remembered I had switched to 40mhz bandwidth. Switching to 80mhz got back up to th 270-280, AND I was hitting 95% and more with idle time hitting 0%.
I realised I had been having better luck lately in not seeing the every few days drop of the 2.4Ghz radio. Around the same time I also had changed to the Apr 3 snapshot, for the AQL/airtime fairness work for ath10k, so possibly newer firmware/drivers are in the mix as well.
I'm now wondering if there is some bug in there, that gets triggered when the system runs low/out of sirq or overall resources? This might explain odd observed behavior in other threads on this, where doing something or disabling the 5Ghz radio affects the 2.4Ghz one. Different branches of code and hardware, shouldn't affect each other, but maybe the ath9k branch can hang when things max out, and it's the 5Ghz branch that has the ability to max it? Or, the later FW/driver releases are better?
How do we pass this to developers in this area as an idea to check?