Procd init script: how to execute action after service crashed?

takimata · October 23, 2024, 11:12pm

Hi, I would appreciate a little help from someone who knows their way around procd.

I have here a small init script launching a shell script parsing the output of an RTL-SDR dongle. It is pretty much as bare bones as it can get:

#!/bin/sh /etc/rc.common

START=78
STOP=12
USE_PROCD=1

start_service() {
        logger -t rtl433_json starting
        procd_open_instance rtl433_json
        procd_set_param command /usr/sbin/rtl433_json_splitter
        procd_set_param respawn 300 5 5
        procd_close_instance

My problem is that every now and then, the USB dongle stops working and can't be persuaded to cooperate again. The shell script stops running and in subsequent starts immediately stops again. All that is very correctly identified and caught by the respawn parameter: after five attempts, the init script gives up and logs the service as irredeemably crashed.

What I would like to do now is to react to that correctly identified crash. In my case, I would simply take the nuclear option and reboot the router.

Is there a way to do that, ideally not while accidentally catching a regular service shutdown? I am hoping that, rather than writing a small cronjob that checks for the service status every few minutes, procd caters for this scenario and calls a subroutine with an error code or something to that effect.

brada4 · October 24, 2024, 5:03am

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:

Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/wireless

brada4 · October 24, 2024, 5:16am

Underpoered usb port ways?

Cthulhu88 · October 24, 2024, 5:33am

You can listen to instance.fail. This will be triggered when procd gives up on starting your service: https://github.com/openwrt/procd/blob/master/service/instance.c#L858

evs · October 24, 2024, 5:33am

I've only ever done respawn....

Another service that watches the service is my first thought?
edit: cthulu88 looks like they actually know what to listen for and just beat me to it haha.

Or write another script that is a wrapper to the currently crashing script and make that be what the service runs?

i.e.

service start -> my start then wait until exit with timer -> /bin/sh "/var/myscript.sh"

takimata · October 24, 2024, 5:39am

That sounds very much like what I'm looking for. How do I catch that in a script (within the init script maybe)? I can't seem to find any documentation or examples on how to react to service events.

reinerotto · October 24, 2024, 5:43am

As a simple work around, you can permanently run a small script, pipe the log messages to it, to be parsed.
In case of crash, trigger reboot.
You can already filter the messages, passed to your script, using the appropriate options for logread.

takimata · October 24, 2024, 5:49am

Thanks, and yes, I know. Even simpler, one can test for service rtl433_json running and parse $? in a cronjob. The workaround is in place, I'm looking for the non-workaround.

Cthulhu88 · October 24, 2024, 6:03am

Try this:

procd_add_crash_trigger() {
    _procd_open_trigger

    json_add_array
    _procd_add_array_data "instance.fail"

    json_add_array
    _procd_add_array_data "if"

    json_add_array
    _procd_add_array_data "eq" "service" "$1"
    shift
    json_close_array

    json_add_array
    _procd_add_array_data "run_script" "$@"
    json_close_array

    json_close_array
    json_close_array

    _procd_close_trigger
}

Then add it to your service_triggers(), example:

procd_add_crash_trigger service_name reboot

EDIT: Added missing procd_open_trigger and procd_close_trigger

takimata · October 24, 2024, 6:41am

That seems very promising, thank you! However, in a first test it does not work. procd recognizes the crash loop:

Thu Oct 24 15:21:24 2024 daemon.info procd: Instance rtl433_json::rtl433_json s in a crash loop 6 crashes, 1 seconds since last crash

But it does not trigger the reboot (or any other script I put into the service trigger's second parameter. Yes, I used the edited version, and I replaced "service_name" with the name of my service/instance, "rtl433_json".)

For kicks and giggles, I also tried

	procd_add_raw_trigger "instance.fail" 500 reboot

with the same non-effect. That was a bit of a shot in the dark, though.

Cthulhu88 · October 24, 2024, 6:43am

Did you see my edit above? What's the output of service service_name info?

takimata · October 24, 2024, 6:47am

Yes, I saw your edit, and according to service info, the trigger is picked up:

{
        "rtl433_json": {
                "instances": {
                        "rtl433_json": {
                                "running": false,
                                "command": [
                                        "/usr/sbin/rtl433_json_splitter"
                                ],
                                "term_timeout": 5,
                                "exit_code": 0,
                                "respawn": {
                                        "threshold": 300,
                                        "timeout": 5,
                                        "retry": 5
                                }
                        }
                },
                "triggers": [
                        [
                                "instance.fail",
                                [
                                        "if",
                                        [
                                                "eq",
                                                "service",
                                                "rtl433_json"
                                        ],
                                        [
                                                "run_script",
                                                "reboot"
                                        ]
                                ]
                        ]
                ]
        }
}

Do we actually need the if clause? Would it trigger on any other service's crash otherwise?

Cthulhu88 · October 24, 2024, 6:54am

Yes. it's a generic ubus event you're listening to.

Everything in the procd source code indicates that this should trigger the correct call to trigger_event from ubus.
Try changing reboot for an actual shell script with execution permission and/or try naming the absolute path to reboot/script.

takimata · October 24, 2024, 6:57am

Already tried that, both /sbin/reboot and /root/test.sh (+x'ed) have the same non-effect.

Also, even if we need the if clause, with the aforementioned

	procd_add_raw_trigger "instance.fail" 500 reboot

which results in this trigger:

                "triggers": [
                        [
                                "instance.fail",
                                [
                                        [
                                                "run_script",
                                                "/sbin/reboot"
                                        ]
                                ],
                                500
                        ]
                ]

at least it would trigger on any instance.fail, right? Which it still doesn't.

blame'ing the code, the trigger has been added 11 years ago, so I'm fairly certain it is not a release-23.05-vs-snapshot issue either. Weird.

Cthulhu88 · October 24, 2024, 7:00am

Do you see the event being triggered, if you manually listen to it via ubus listen?

takimata · October 24, 2024, 7:02am

Nope. ubus listen stays entirely quiet, from starting the service to the crash loop, to stopping the service.

Edit: ubus monitor has it light up, though:

<- c38ef86d #e09bba91         notify: {"objid":-526665071,"method":"instance.fail","data":{"service":"rtl433_json","instance":"rtl433_json"},"no_reply":true}

Cthulhu88 · October 24, 2024, 7:12am

Apparently, ubus notifications != ubus events. Triggers will only work with events and service_event sends a notification instead.

Cthulhu88 · October 24, 2024, 7:18am

For notifications, you want to use "watch", instead of "triggers".

Do you see the notification if you run ubus subscribe service instance.fail?

EDIT: The above won't work as it expects a json with object id and target.

takimata · October 24, 2024, 7:37am

procd_set_param watch instance.fail then? I only ever see the watch param with network.interface, and it's a bit obscure by which mechanism it then actually causes something to happen.

Cthulhu88 · October 24, 2024, 7:43am

I think you need to do the following.

start_service():

procd_set_param watch service

service_triggers():

procd_add_crash_trigger ...