Debuging hostapd, radio getting periodically disabled

Hello
I'm strugling with this since months, seen people over internet having same issue on different openwrt devies but no definite solution or information how to debug this.
Basically the both radio0 and radio1 are getting to disabled state after some days of using the router without touching administrator panel.
I'm looking for a way to debug it and make temporary cron job as a workaround but not sure what exactly needs to be restarted - if it's firrmware or hostapd or sth else.
The setup is:

Model	TP-Link Archer C20i
Architecture	MediaTek MT7620A ver:2 eco:6
Target Platform	ramips/mt7620
Firmware Version	OpenWrt 22.03.2 r19803-9a599fee93 / LuCI openwrt-22.03 branch git-22.288.45147-96ec0cd

but the bug is netither specific to this particular openwrt version - had it previously nor this particular hardware - seen people arround internet with same issue. Some people said it might be RAM exhaustion issue

root@Archer:~# cat /etc/config/wireless 

config wifi-device 'radio0'
   option type 'mac80211'
   option hwmode '11a'
   option path 'pci0000:00/0000:00:00.0/0000:01:00.0'
   option cell_density '3'
   option htmode 'VHT80'
   option distance '40'
   option country 'PL'
   option channel '36'
   option disabled '1'

config wifi-device 'radio1'
   option type 'mac80211'
   option hwmode '11g'
   option path 'platform/10180000.wmac'
   option htmode 'HT40'
   option channel '7'
   option country 'PL'
   option txpower '20'
   option distance '5'
   option cell_density '3'
   option disabled '1'

config wifi-iface 'default_radio1'
   option device 'radio1'
   option network 'lan'
   option key '**SNIP**'
   option encryption 'sae'
   option mode 'ap'
   option ssid '**SNIP**'
   option wpa_disable_eapol_key_retries '1'
   option ieee80211r '1'
   option nasid '**SNIP**'
   option mobility_domain 'SNIP'
   option ft_over_ds '1'
   option ft_psk_generate_local '1'
   option disassoc_low_ack '0'

config wifi-iface 'wifinet2'
   option device 'radio0'
   option mode 'ap'
   option ssid '**SNIP**'
   option encryption 'sae'
   option network 'lan'
   option key 'SNIP'
   option wpa_disable_eapol_key_retries '1'
   option disassoc_low_ack '0'

config wifi-iface 'wifinet3'
   option device 'radio1'
   option mode 'ap'
   option encryption 'psk2'
   option network 'lan'
   option key '**SNIP**'
   option ssid 'Legacy_printer'


The radio0 and radio1 disabled state visible above and in the UI was not set manually.
I'm not able to start it via simple commands:

root@Archer:~# wifi reload
'radio0' is disabled
'radio1' is disabled
'radio0' is disabled
'radio1' is disabled
root@Archer:~# wifi up
'radio0' is disabled
'radio1' is disabled
'radio0' is disabled
'radio1' is disabled

The hostapd state is as follows:

root@Archer:~# ps | grep hostapd
 1285 root      2660 S    {hostapd} /sbin/ujail -t 5 -n hostapd -U network -G network -C /etc/capabilities/wpad.json -c -- /usr/sbin/hostapd -s -g /var/run/hostapd/global
 1290 network   4428 S    /usr/sbin/hostapd -s -g /var/run/hostapd/global
18443 root      1308 S    grep hostapd

Also I'm seeting this permission denied error in logread, not sure if it's releated but this permission denied could cause some script misbehaviour by premature termination:

Wed Dec  7 13:10:01 2022 daemon.notice hostapd: Remove interface 'wlan1'
Wed Dec  7 13:10:01 2022 daemon.notice hostapd: wlan1: interface state ENABLED->DISABLED
Wed Dec  7 13:10:01 2022 daemon.notice hostapd: wlan1-1: AP-STA-DISCONNECTED 00:80:92:b7:b9:12
Wed Dec  7 13:10:01 2022 daemon.notice hostapd: wlan1-1: AP-DISABLED
Wed Dec  7 13:10:01 2022 daemon.notice hostapd: wlan1-1: CTRL-EVENT-TERMINATING
Wed Dec  7 13:10:01 2022 daemon.err hostapd: rmdir[ctrl_interface=/var/run/hostapd]: Permission denied
Wed Dec  7 13:10:01 2022 daemon.notice netifd: Network device 'wlan1-1' link is down
Wed Dec  7 13:10:01 2022 kern.info kernel: [220994.874660] br-lan: port 4(wlan1-1) entered disabled state
Wed Dec  7 13:10:01 2022 kern.info kernel: [220994.895530] device wlan1-1 left promiscuous mode
Wed Dec  7 13:10:01 2022 kern.info kernel: [220994.900362] br-lan: port 4(wlan1-1) entered disabled state
Wed Dec  7 13:10:01 2022 daemon.notice hostapd: wlan1: AP-DISABLED
Wed Dec  7 13:10:01 2022 daemon.notice hostapd: wlan1: CTRL-EVENT-TERMINATING
Wed Dec  7 13:10:01 2022 daemon.err hostapd: rmdir[ctrl_interface=/var/run/hostapd]: Permission denied

only reconfiguration of network helps getting the radio* running again.

I encountered a similar one on archer c20 v5, what it was, I did not understand. With a lack of RAM, I solved it by installing the zram-swap utility.
You can also try to install luci-app-wifischedule, it turns off and on wifi at a given time

That device only has 64MiB of RAM. I wouldn't try and use zram swap, I'd try and use some real swap. If has a USB port.

Also, you might try replacing hostad/wpa-supplicant with wpad-basic. It has both the authenticator and supplicant in one, and the basic version cuts out some stuff most people don't use, like enterprise authentication modes. It can do most everything, including wds modes.

Do you mean replace wpad-basic-wolfssl with wpad-basic ?
I'll try, thanks for the advice.
And zram-swap works fine on 64 megabytes of RAM

No. wpad-basic vs wpad-basic-wolfssl shouldn't make much of a memory footprint difference. Fom the thread title it seemed that the OP was using hostapd/wpa-supplicant, and those will take up more memory vs wpad-basic/wpad-basic-wolfssl.

In your case, I think some (real) swap is your best bet.

I already replaced wpad-basic-wolfssl with wpad-basic and while the flight is normal, there were really too many functions I didn’t need, for example wpa3, I don’t use it.
Previously, there were some jambs with a wifi connection, I'll test it on this package, thanks for the idea anyway :slight_smile:

I'm using both WPA3 and mesh network and the point of this thread was not to search for workaround with own limitations but to debug what exactly happens . I believe even if it's memory exhaustion then there should be a propper linux way to handle it instead of silently failing.
Also if the issue comes not immediately but after some hours of usage then it means that we simply have a memory leak need to be fixed