Snapshot Build - Reboot Loop because of script in files/

Dear friends, happy 2022! I have an issue I don't really know how to fix.

I'm a long time builder of openwrt firmware, been building from git since 2014. Not a stranger to the process. I normally set up additional steps under the files/ folder so that upon first successful boot, things can run and set up automatically via UCI.

I made a mistake though that has me stumped, I erroneously didn't remove a command from the very bottom of the script, which was, this very last line:

# Restart firewall
echo -en "\nRestarting firewall... "
/etc/init.d/firewall restart >/dev/null 2>&1
echo "Done"

reboot && exit

This reboot && exit is causing my build to reboot every single time I start up. I get ping and can ssh into the router for a scarce 5 or 6 seconds before the command gets to the reboot, and now I can't stop it. Normally, I'd just be able to tftp a factory image and be done with it, but in this particular router I can't, SO:

I tried sending an ssh script to stop the script from loading (called /etc/startup.sh), or killing it. but I've tried many things unsuccessfully such as:

a. Renaming /sbin/reboot to something else
b. killing busybox with killall -9 busybox
c. running a script that extracts the PID of any scripts running via ash and killing them with kill -9, but I just can't seem to stop it in time before the reboot, and It just bootloops like crazy. (but again, not bricked, I do manage to get a good 4 or 5 seconds in the terminal before it bootloops).

Given this 5 second window, I've even run remote scripts with ssh -s root@target < suchandsoscript.sh to automate these steps.

So, anyway, my million dollar question is:

What creative way that I haven't thought of yet, with a 4-5 second windows to act upon via ssh could I run to stop the script from going on, and/or what could I rename so that reboot is paused, or somehow the system doesn't loop, so I can go in, and send a new sysupgrade -n -v with the proper build without my terrible mistake?

Thanks for chiming in!

Slighty difficult to provide advice without knowing how exactly you hooked up your script (some get executed before the overlay is mounted, some after, some have other implications).

Any chance to nuke fw3 and make it spin in an endless loop?

E.g. by overwriting /sbin/fw3 with:

#!/bin/sh
while :; do :; done

That won't work during the first attempt, but may work after the already started rebooted (if the modified overlay gets mounted early enough).

The downside, if that only succeeds partially, you now no longer have any grace period left at all - so make sure that at least resetting the overlay is still an option.

Maybe read up on the exact procedures procd kills processes and detaches the shell, pivot rooting into a RAM environment...

EDIT: /etc/firewall.user sounds like a safer candidate for the endless loop than trying to clobber /sbin/fw3 itself.

files/ is about your build system, not about the live router. That does not much tell about the script's role...

So, how does your script get launched?

Is it at the end of /etc/rc.local?
Uci-defaults script?
Service startup script with priority XX ?

Depending on the answer and script mechanics, I might aim for

  • disabling the service symlink in /etc/rc.d
  • Editing the culprit script's reboot line with sed (and simply change reboot to garbage)
  • Removing/renaming the script file.

Some of those might work if the script gets launched so late that overlay is in play already.

Edit:
Additional thought is the failsafe mode. Enter the failsafe mode, and sysupgrade? (Or disable/edit the script there)

Thanks so much. The script indeed is run as an rc.local event.

If it is so late as rc.local, which is actually run really late, it should be relatively easy, if you have several seconds of access...

rc.local is run by init script "done" run with the priority 95. The normal rc.local contains nothing important (only minor stuff like setting LEDs to normal run mode, removing the old settings archive after sysupgrade). So, you could pretty safely also remove the whole /etc/rc.local

Alternatives:

  • enter failsafe mode, mount rootfs and edit the /etc/rc.local and remove the reboot line
  • disable "done" service either with
    "/etc/init.d/done disable" or
    "rm -f /etc/rc.d/S95done"

For reference:

sed find/replace alternative:

root@router1:~# cat /etc/rc.local
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

test test reboot && exit

exit 0

root@router1:~# sed -i -e "s/reboot/garbage/" /etc/rc.local

root@router1:~# cat /etc/rc.local
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

test test garbage && exit

exit 0

done script symlink removal

root@router1:~# ls -l /etc/rc.d/ | grep done
lrwxrwxrwx    1 root     root            14 Jan  6 10:40 S95done -> ../init.d/done

root@router1:~# /etc/init.d/done disable

root@router1:~# ls -l /etc/rc.d/ | grep done

the done script:

root@router1:~# cat /etc/init.d/done
#!/bin/sh /etc/rc.common
# Copyright (C) 2006 OpenWrt.org

START=95
boot() {
        mount_root done
        rm -f /sysupgrade.tgz && sync

        # process user commands
        [ -f /etc/rc.local ] && {
                sh /etc/rc.local
        }

        # set leds to normal state
        . /etc/diag.sh
        set_state done
}
2 Likes