architecture: ramips and mips (this happens on both)
So sometimes when I run sysupgrade -n s_upgrade.bin, my router just shuts off. So I opened up a terminal via serial and found this:
Sending KILL to remaining processes ... hostapd hostapd hostapd hostapd
hostapd hostapd hostapd hostapd hostapd hostapd
Failed to kill a[ 125.891451] reboot: Restarting system
ll processes.
sysupgrade aborted with return code: 256
Everything I've read about other people having this issue say to upgrade the firmware (tried that), or kill the process that is causing sysupgrade to give this error, but it's a different process almost every time. Is there anything else I could try to fix this? It's really annoying more than anything since it doesn't actually break anything, you can just turn it back on and try again and it usually works.
Any insight will be helpful, or just let me know if more information would be helpful to you.
Thanks for the reply! At the moment I’m not running a build of OpenWrt with any wireless drivers and so hostapd isn’t running on the system. When digging into this a bit I did run across that WiFi might cause an issue on the issue tracker or on a mailing list. In trying to isolate what the problem could be I’m running a build without any WiFi. I will look more in the morning to see if I can jog my memory on which process caused the issue more than once.
Will there be an official patch ?
Asking, because I also had issues regarding sysupgrade not doing anything, blaming open swapfile to be the reason, but, may be I was wrong.
Some more testing today and the two processes I have seen cause sysupgrade to fail more than once are ntpd and odhcpd. Can I just run killall -9 ntpd and killall -9 odhcpd before sysupgrades to prevent that?
EDIT: mount_root is another process that I have seen causes this problem
So when I run killall -9 ntpd and killall -9 odhcpd the processes just restart themselves. I suspect there's some underlying mechanism that restarts them that I also need to stop?
I'm also trying to do this programmatically in C to run sysupgrade, but when I run this, ntpd and odhcpd restart and then the C app hangs so sysupgrade doesn't execute.
Hmm I couldn't get that to work for ntpd; the process still restarted. I checked and ntpd isn't in the init.d directory, so maybe something else is starting it?
However, the disable for odhcpd seemed to work.
Edited my post above, but I also see this happening with mount_root. Would running /etc/init.d/umount disable disable mount_root from restarting? (and not cause any detrimental effects)
There is an open bug-report from me. May be, now you can confirm serious issues with sysupgrade, at least. I suspect, it needs a careful overhaul, as sysupgrade is very dangerous, when not properly working.
Feel free to copy the above and add to the bug report. I don't see a link for it here.
If it were only wireless causing the issue, a "hack" of wifi down would likely cover it. However, I see an issue with mount_root described by @cvocvo as well.
My bugreport: FS#2024
As it is another issue regarding sysupgrade, you might better file your own, and reference mine, please.
Also here on the forum various other "effects" described, when using sysupgrade.
the sleep 1 is really not good. could you try to figure out what
actually causes the 256 and try to fix that instead please ?
John
Note that procd is effectively undocumented and if a one-second delay to let hostapd gracefully shut down is "not good", what negative adjective then properly describes this mess?
kill_remaining() { # [ <signal> [ <loop> ] ]
local loop_limit=10
local sig="${1:-TERM}"
local loop="${2:-0}"
local run=true
local stat
local proc_ppid=$(cut -d' ' -f4 /proc/$$/stat)
echo -n "Sending $sig to remaining processes ... "
while $run; do
run=false
for stat in /proc/[0-9]*/stat; do
[ -f "$stat" ] || continue
local pid name state ppid rest
read pid name state ppid rest < $stat
name="${name#(}"; name="${name%)}"
# Skip PID1, our parent, ourself and our children
[ $pid -ne 1 -a $pid -ne $proc_ppid -a $pid -ne $$ -a $ppid -ne $$ ] || continue
local cmdline
read cmdline < /proc/$pid/cmdline
# Skip kernel threads
[ -n "$cmdline" ] || continue
echo -n "$name "
kill -$sig $pid 2>/dev/null
[ $loop -eq 1 ] && run=true
done
let loop_limit--
[ $loop_limit -eq 0 ] && {
echo
echo "Failed to kill all processes."
exit 1
}
done
echo
}
indicate_upgrade
killall -9 telnetd
killall -9 dropbear
killall -9 ash
kill_remaining TERM
sleep 3
kill_remaining KILL 1
sleep 1
Seems like a nominal delay would be preferable over a system crashing. In our case we have some devices at remote locations, so crashing and requiring a manual reboot + retry to upgrade loop until it works isn't a great option.
I commented a link to your patch and post on the submitted bug.