Procd respawn, how is it useful?

ValdikSS · January 15, 2023, 4:39am

I spent about 3 hours trying to understand why my procd service, which I configured to respawn with low threshold, restarts. The service itself is running much beyond configured threshold, and what's even more confusing, does not obey configured retry times, but instead gets respawned indefinitely.

It seems that procd respawn option means exactly that — it respawns the process when it exits. Not when it terminates abnormally fast, not when its exit code is not zero, but quite the contrary: it would stop respawning on abnormal activity but will respawn forever on normal termination.

If the service exits before respawn_threshold timeout, it would be restarted in respawn_timeout seconds, not more than respawn_retry times
If the service exits after respawn_threshold timeout, it would be restarted indefinitely (respawn counter is reset) in respawn_timeout seconds, regardless of respawn_retry value.
Process exit code does not matter

This concept is very confusing to me. I though respawn option is made to respawn only failing service, like in other service managers/supervisors.
The question is: in what cases is it useful?

hnyman · January 15, 2023, 8:51am

I think that it is for services that should be always on, but may occasionally crash due to external factors. If they crash rarely, they get restarted. But if they constantly crash, it is considered to be futile and retstarts are abandoned.

E.g. statistics collection daemon collectd may crash at startup with certain wrong configs. Then procd abandons the starting attempt after 5 failures. But it may also crash later due to some conditions like an unexpected error from a device/sensor driver. If that happens just once or rarely, the service is restarted.

Similarly, usb drivers reacting to some devices may cause trouble for various processes. Or bad sectors at disk, or ...
Not all of the hundreds of apps are perfect for handling all failure situations, so procd offers a casual way of trying to keep services up.

ValdikSS · January 15, 2023, 9:02am

You're essentially describing restarts on failure. However, procd does not differentiate between crashes (exiting with non-zero exit code) and successful service termination, it literally just restarts the service based only on the fact that it has terminated and at which amount of seconds.

For example, I wanted to write a script which work with network data. I want it to be robust, especially upon boot, where the script may start before the network is fully configured. This is not a long-running script and not a daemon. My script terminates with non-zero exit code upon failure and with 0 on success.
Despite my expectations, procd's respawn it not suitable for this case. It would just restart my script when it did all its job successfully.

hnyman · January 15, 2023, 9:45am

Respawn is for daemons.

Sounds like you are not really looking for restarting the service, but waiting with the start until the network is up.
There are a few services like that. Search the forum e.g. for "ubus wait"
E.g.

Or your script might first sleep for a few seconds.