Wireless radio0 failure

I'm using an rpi3b with relayd as a failover gateway with pfsense. It had been working fine for over a month using lede-17.01.4-brcm2708-bcm2710-rpi-3-ext4-sdcard.img

Suddenly it went offline and I can see that wireless is no longer connecting. I can't even scan for any wireless ssid.

When I look in system log, I see these related fragments (ignore the date/time):

    Mon Feb 26 13:27:27 2018 daemon.info odhcpd[312]: Raising SIGUSR1 due to address change on br-lan
    Mon Feb 26 13:27:27 2018 user.notice firewall: Reloading firewall due to ifup of lan (br-lan)
    Mon Feb 26 13:27:27 2018 daemon.notice netifd: radio0 (426): command failed: No error information (-524)
    Mon Feb 26 13:27:27 2018 daemon.notice netifd: radio0 (426): command failed: Not supported (-95)
    Mon Feb 26 13:27:27 2018 daemon.notice netifd: radio0 (426): command failed: I/O error (-5)
    Mon Feb 26 13:27:27 2018 daemon.info procd: - init complete -
    Mon Feb 26 13:27:27 2018 daemon.notice netifd: radio0 (426): command failed: Too many open files in system (-23)
    Mon Feb 26 13:27:27 2018 kern.info kernel: [    7.631249] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready

On another SD card I downloaded and installed the same version software software and I can scan and setup wlan fine so I know the hardware is working.

Then on yet another SD card, I installed a backup img of my orginal working software with the bridge that had been working for a month. It worked fine when I made the backup. To my surprise, it had the same errors in the log and failed to work.

I'm stumped trying to figure out what's wrong and furthermore why the original backed up software also failed. Any suggestions?

Does iw dev or iw list suggest that the device is available, but just not responding properly to netifd?

That looks suspicious.

Quick responses for first time poster - thanks!
I haven't identified the problem yet.

  • I imaged a new SD with latest stable and booted
  • Using LuCI, I did a scan and join network then in ssh
  • opkg update
  • opkg install luci relayd luci-proto-relay nano
  • reboot

When rebooting it connected to joined network fine, but when I looked at system log, I see the exact same errors I listed above yet now the network is connected. Now I'm even more stumped.

How many files are opened by which process?

Maybe you have a more comprehensive command, but I get this for my top pid's with open files which doesn't seem like a problem:

  • PID = 1 with 13 file descriptors
  • PID = 131 with 15 file descriptors
  • PID = 290 with 16 file descriptors
  • PID = 316 with 18 file descriptors
  • PID = 649 with 14 file descriptors

Reference https://stackoverflow.com/questions/21752067/counting-open-files-per-process

Keep in mind that the last attempt and new image works even with the "Too many open files in system (-23)" error. Could it be those logs snips aren't relevant?

# opkg install lsof

You might be able to catch the "moment" in /lib/netifd/hostapd.sh or /lib/netifd/wireless/mac80211.sh

Is lsof dependent on something else? I can't install as I get:
opkg install lsof
Unknown package 'lsof'.
Collected errors:

  • opkg_install_cmd: Cannot install package lsof.

Have you refreshed the local package repository information with opkg update?

Sorry. I didn't realize I needed to do that update again.
I'm not sure what I'm looking for but I see:
netifd 288 root 10r DIR 179,2 4096 360 /lib/netifd/wireless

It's working fine even though the log has those same errors. I'm thinking those errors (which indeed look suspicious) are unrelated and maybe just a phase at boot which gets corrected later.

But if I go back to the original SD card (described above) it fails so I'm still trying to compare the two SD images to try and understand what went wrong. If I weren't on location, this failure would be impossible to fix.

(On a side note, the relayd software is incredibly useful and I'm appreciative somebody had the foresight to support that capability.

The package list resides in /tmp, hence getting lost at reboot.

rpi + suddenly + SD card trigger something in my memory: I had occasional problems with my rpi3, which suddenly crashed, and was not bootable any more due to SD card corruption.

Turned out that the power supply was insufficient (combined with a bad USB cable), i.e. too high voltage drop under load, which then led to SD card corruption. Turning off power without shutting the rpi down has the same effect.

-> Check your SD card for filesystem errors.

Do you know of any power outages at the time the wifi failed the first time?
What powersupply do you use?
Any other electrical consumers attached to the rpi?
Do you have a USB monitor to measure the voltage?

Nothing wrong with power or SD card. Runs off UPS so no power outage. No other consumers on pi other than wifi/ethernet.

I'd like to get back to the:

daemon.notice netifd: radio0 (426): command failed: Too many open files in system (-23)

From my empirical testing with a fresh img of:

Boot 1: no (-23) error and no radio0 //as you would expect
--Change: scan and join wireless on wwan - reboot

Boot 2: (-23) error appears in log as in my first post of this thread
--Change: Remove wireless and reboot

Boot 3: no (-23) error and no radio0 //as you would expect
--Change: scan and join a different wireless on wwan - reboot

Boot 4: (-23) error appears

Summary: I believe this error is unrelated. A google search of openwrt also found this error in many logs (seemingly) unrelated to problem user was having. I think it's just a phase error and thus not really an error.

I went back to my img SD card at the top of this thread.
I deleted the interface and added it again and it all works as expected.
I'm now fairly confident that something in the interface got corrupted but I have no way of doing post-failure analysis.
I'd like to have more verbose logging on the interface to a syslog server but I first have to figure out how to do that across different LAN subnet.