I have a build from mid July 2023 that does not exhibit this issue tho. Both the commits you listed works fine with my build from mid July.
So looks like something changed in the WiFi config scripts, mt76 or both since then.
Losing WiFi post-reset is going to cause issues for users.
Edit: @hnyman is right in that once the wireless interface is enabled, Luci configuration is back to normal. If the router was reset, Luci shows the device as unknown and no configuration can be done. This behaviour has changed from what it was previously.
I could see this confusing users when master's changes percolate to 23.05 or later releases.
I suppose Luci need to be enhanced to take care of when device is not enabled upon factory reset. It gives the impression that the firmware is not working for wireless.
Yeah, I just listed to those as the explanation, why things works once you "enable" the radio. I guess that one possibility is that the things may have got semi-broken by the large netifd related change by @nbd , merged yesterday. And it might even have been fixed today with https://github.com/openwrt/openwrt/commit/150e6d28f2659b0614e87cc7822e93f488e11ede , or could be something similar in the new code.
So, the initial config in the first boot differs from the supposedly same config re-generated later.
With the r21634-7272203022 / 2022-12-30 09:20
If there is "path", LuCI shows the chip type immediately ok, (but you still have to enable the radio in LuCI)
If there is "phy", LuCI initially shows just "Generic 802.11bg" for both (wrong), and only shows the proper chips like "MediaTek MT7915E 802.11acaxn" after enabling the radio.
But the change has happened somewhere in 2022.
With r20755-582c098c09 / 2022-09-26
initial config has "path" AND LuCI shows already initially the correct chip (and you still need to enable radios).
"wifi config" still produces "path", too, so the initial config file is the same as one generated later by "wifi config".
So, the two commits from October 2022 that I linked above are more likely the reason for the changed behaviour than the later changes.
This behaviour looks to be a by-product of the two commits you have identified that was committed at the end of 2022.
And the reason my build from mid-July did not see this issue after factory reset is, I think, a by-product of this commit:
Before this commit, when the wifi config script runs, it's likely that the board.json file has not been created yet, so the wifi config script ended up with path as a config in /etc/config/wireless. After this commit, as it checks that board.json has to be created before wifi config is executed, we end up with phy as a config instead in /etc/config/wireless.
So that's why you see this behaviour when you removed /etc/config/wireless and tries to re-create it, as board.json is already created at that point.
The issue is that if a wireless interface is disabled and the router rebooted, Luci will again revert to not being able to find the wireless interface and therefore not presenting the correct options for configuring disabled wireless interfaces. This will confuse many users relying on Luci, such as myself
At this point, I think one of two things should happen:
Change the behaviour of wifi config scripts to align with what Luci expected for disabled wireless interfaces; or
Change Luci to support phy in the config file instead of just using path to configure disabled wireless interfaces.
I think 1. is probably easier to do compared to 2.?
I have no idea why the WiFi config script's behaviour was change to configure phy instead of path if the board.json file is found tho. Probably the devs forgot to sync this change to Luci.
I've concluded that the two issues are not related. I've obtained 100% CPU usage without the "event format" patch and also some time ago the router was inaccessible after running a while having the patch applied, maybe caused by a timeout but I did not use the timeout detection script then.
My internet Connection is through ADSL modem.
After some more reading I'm starting to think that my ADSL modem (192.168.1.1) and my RT3200 both having the same IP address is causing my past and present internet issues.
I tried to change the subnet of my old TPLink router to 192.168.2.1/24 but unfortunately lost access to it.
Edit:
Via LUCI - Network - Interfaces - LAN - edit button - General Settings - ipv4 address I was able to change the subnet to 192.168.2.1 on the RT3200.
Things seem a lot better now. It looks like this may have solved the issues I was having.
, do you have wed turned on via /etc/modules.conf?
It could be that enabling wed may be causing the timeout.
An interesting discovery I noticed is that even though I removed the wed module entry in /etc/modules.conf and rebooted my router, I still see entries when I cat /sys/kernel/debug/ppe0/bind which suggest that wed is still enabled?
Edit: The entries described above appears to be from wired clients as I have hardware flow offload enabled. It’s not from wed.
In any case, if you see the occasional or even frequent timeouts, try removing the wed entry in /etc/modules.conf and see if it helps you.
Yes, the entries you see in /sys/kernel/debug/ppe0/bind just show that hardware NAT acceleration is active. To check if wed is working you must find in that list the MAC addresses or IP-s of wireless clients, but wed works for latest mt76 builds only if the commit from https://github.com/openwrt/mt76/pull/806 is applied.
I usually compile and use a new build from main branch once every 3-4 days and regarding the timeout events, I did not detect any recently (the last was several weeks ago). I did encounter one or two total crashes or freezes (router inaccessible from LAN) but couldn't find something relevant in logs, the router started normal again and not in recovery mode. Maybe the new 6.1 testing kernel is the problem? And also the 100% CPU usage (for no reason) issue seems to become rarer.
Have not had WED working in a while. I am running 23.05 snapshot and it's likely this commit restored it.
The device page suggests Hardware Flow Offloading in the Firewall section is not needed to use WED. That doesn't seem to be the case for me as I tried fully off, then with only Software Flow Offloading, both with no effect. Turning on Hardware Flow Offloading significantly reduced CPU usage.
I can now get just over 800Mbps with ~30% CPU usage, without WED working ~600Mbps with 100% usage.
I'm not solid on the openwrt mechanics with the different repositories...
Is that better fix in the mt76 repository and would that be pulled if I install a current snapshot image?
If yes I'd give WED another try.
In the wireless settings, all modules/devices are shown as "Generic Unknown" after startup.
Also scanning for WiFi networks doesn’t work as long as no AP is activated.
The expected behavior (tested with another OpenWrt device with different CPU): name of the WiFi devices is shown correctly and scanning networks is possible without enabling an AP before.
The issue is not present in v23.05.0-rc3 but in the latest snapshots
Thanks for mentioning this - I just flashed 23.05-RC3 on my RT3200 and I have WED enabled using bridger (dumb AP mode), and I was seeing strange timeouts in some of the iperf3 test runs that I was doing.
I haven't seen that message in my error logs, but I'd need to rerun some tests since I may have rebooted since I last saw them.
Other than the odd timeouts, the performance of 23.05-RC3 has been excellent. Crazy speeds using AX + 160Mhz and the latencies are excellent over wifi (with WED enabled).