It's time to open a bug report:
I had the Ethernet interface on AX3600 crashing the third time in 3 days now. Running current snapshot without any additional nss patches or something.
This time, I had:
a) another AP (let's call it AP2) connected to WAN port of AX3600 (all 4 ports bridged as "br-lan", no VLAN or anything, no actual WAN)
b) Client connected to that AP2
c) Client running iperf3 -c against iperf3 -s running on the AX3600 (actually, I wanted to benchmark the wifi of AP2...)
Interestingly, I still got a few bytes of traffic after it stopped working. Following is the log of iperf3 -s via SSH console that was open all the time:
while the lines further down took some time to appear.
After I did not get any more traffic through, I connected via wifi directly to the AX3600 which still worked. There was nothing in dmesg or logread, just the 2 VPN clients complaing about missing internet. rmmod qca_nss_dp immediately rebooted the AX3600.
Hm, unfortunately I cannot reproduce the issue.
What I was "hiding" in the post above:
the AX3600 also had two OpenVPN client sessions running, but both did not have much traffic
There were 4 SSIDs on both main radios, the 3rd ath10k radio was disabled.
For the following tests, I removed all SSIDs and created a new, test one; also I removed all VPN configurations and all other network interfaces (wan and lan sides for the vpns; but nothing VLAN related, e.g. still not running any VLAN)
The AX3600 got replaced by a Netgear WAX206 in production (that one is doing great, wifi performance slightly worse than AX3600, but that's fine), so I have it on my table now.
I pushed iperf3 traffic to and through the device via ethernet and via wifi as well as "switched" (e.g. from one ethernet port to iperf3 server running on another device connected via ethernet) and could not reproduce the issue.
A few notes though:
Pushing a full 1 Gig ethernet through the switch loads one core SIRQ 90% (governor "schedutil", staying at default about 1 GHz) or 65 % (governor "performance" at 1,38 GHz)
Additionally pushing 400 Mbit/s via wifi onto iperf running on the system loads first core fully and 2 more cores 60% (mostly napi)
At the same time, the 1 gig ethernet destination iperf server running on "slow" fritzbox 7530 has less than 10% cpu utilization in total (on performance governor/700 Mhz)
-> The default ethernet/bridge/switch implementation in ipq8071A seems to use a lot of CPU ressources
-> The wireless implementation with ath11k also seems to use a lot of CPU ressources in comparison to ath10k/mtk devices
Anyway, performance-wise I don't have a problem. But I need help reproducing and fixing the ethernet hangups as that's the showstopper
I did not update the firmware between all the tests, e.g. everything is as when the crashes happened in the opening post.
Uups, thanks, i might investigate that. My documentation here was maybe better than my rememberings. But as I understood it, all ports of the ax3600 should be equal, e.g. all of them are just phy's connected to the ipq8071a, anyway, i'll run more tests later. Thanks!
Hm, still no success in reproducing the problem If there is no more ideas coming up I guess i'll keep it in the shelf for now and maybe find a use case for it again later / some environment to reproduce the issue.