Topic: procd watchdog

i'm trying to develop an application that uses a watchdog. from what i've discovered, procd has a hold of the default linux watchdog (/dev/watchdog). i found a post (link below) that describes how to turn off the item tickling/petting the watchdog, but it doesn't describe how to then tickle the watchdog on your own, so when i disable it, the system will reset within x seconds. i have tried to disable the ubus/procd watchdog and manually writing to /dev/watchdog, but i get a "file busy" like procd still has a hold of the file, it just stopped tickling it.

http://trac.gateworks.com/wiki/watchdog

how do i tickle the watchdog on my own? i'd prefer to do this in C/C++.  if anyone has any suggestions/examples i'd love to hear it.

2 (edited by R1D2 2016-01-18 11:54:25)

Re: procd watchdog

jgoldin1 wrote:

how do i tickle the watchdog on my own? i'd prefer to do this in C/C++.  if anyone has any suggestions/examples i'd love to hear it.

Yes, procd keeps the watchdog file open even if the watchdog is stopped using the ubus command. I would patch procd and disable the whole watchdog handling there to be able to use it in an application (where it makes much more sense to handle a watchdog than in procd if you ask me).

As for handling the watchdog for yourself, see https://www.kernel.org/doc/Documentatio … g-api.txt. Be aware that not all drivers implement every ioctl command described there. For example, the watchdog on certain platforms/OS combinations use hard-coded or upper-limited timeouts, so a WDIOC_SETTIMEOUT may return -1, while WDIOC_GETTIMEOUT may reveal the timeout value actually used (for example, this is the case on WRT160 in AA - but I did not check yet, whether the driver in trunk behaves that way too). So you probably have to adjust the trigger interval to avoid possible reboot loops.

As for examples on how to use, see http://busybox.sourcearchive.com/docume … ource.html or the procd code. However, both examples don't account for hard-coded timeouts or timeouts with a (small) upper-limit, so you may want to extent this, e.g.:

static int wdog_fd = -1;                /* watchdog file descriptor */
u_int wdinterval, wdtimeout;            /* defined somewhere in your config section */

/*
** Open watchdog device and initialize the watchdog.
** Return file descriptor on success, -1 on error.
*/  
int wd_init(char *dev, u_int timeout) {
        u_int tmout = timeout;          /* wanted timeout (may be changed by WDIOC_SETTIMEOUT!) */
        int rc;

        if ((wdog_fd=open(dev, O_RDWR)) < 0) {
                (void) fprintf(stderr, "Could not open watchdog device (%s)", strerror(errno));
                return -1;
        }
#if defined(WDIOC_SETOPTIONS) && defined(WDIOS_ENABLECARD)
        /*
        ** We probably need to enable the hardware watchdog using WDIOS_ENABLECARD
        ** if the driver implements WDIOC_SETOPTIONS. Possible error indications
        ** are ignored and errno needs to be cleared thereafter.
        */
        rc = WDIOS_ENABLECARD;
        (void) ioctl(wdog_fd, WDIOC_SETOPTIONS, (void *) &rc);
        errno = 0;
#endif
        /*
        ** On some platforms (e.g. the WRT160NL) the WDIOC_SETTIMEOUT ioctl
        ** returns an error indication, since the watchdog uses a hard-coded
        ** (or limited) timeout. Therefore we need to check for this situation
        ** and corrcect the interval/timeout set in the config file to avoid
        ** possible reboot loops if the interval exceeds the fixed timeout.
        */
        if ((rc=ioctl(wdog_fd, WDIOC_SETTIMEOUT, &tmout)) < 0) {
                (void) fprintf(stderr, "Could not set watchdog timeout - returned timeout is %u (%s)", tmout, strerror(errno));

                rc = ioctl(wdog_fd, WDIOC_GETTIMEOUT, &tmout);
                (void) fprintf(stderr, "Tried to get watchdog timeout - rc is %d, returned value is %u (%s)", rc, tmout, strerror(errno));
        }

        if (tmout != timeout) {
                wdtimeout = tmout;
                if ((wdinterval = tmout/3) == 0)
                        wdinterval = 1;
                (void) fprintf(stderr, "Returned timeout value is %u, reset interval to %u", tmout, wdinterval);
        }
        return wdog_fd;                 /* return watchdog fd */
}

Note that WDIOC_SETTIMEOUT may change the timeout stored in tmout, but on the WRT160NL it does not do so, while WDIOC_GETTIMEOUT returns the actual timeout, which is 15 seconds on the WRT160NL running AA.

Re: procd watchdog

I completely agree that handling of watchdog should be done outside procd. What was the reason of making it part of procd?

I'm working on several hardware devices and products that because of this have an additional hardware watchdog implemented with small attiny mcu that get tickled by one openwrt gpio.

This has made a huge impact on reliablity of products that we are deploying, but if you aren't creating your own hardware then you are stuck with only having kernel built in watchdog and it would make much more sense if you have control over it and not only that procd bogarts it smile

Re: procd watchdog

Seams that with patch for magicclose things are looking better now:
https://git.openwrt.org/?p=project/proc … 83073fabb1

Re: procd watchdog

I wrote a detailed blog post regarding how to use hardware watchdog and how to manually take control over it:
http://kernelreloaded.com/manually-cont … -watchdog/

Re: procd watchdog

Excellent docs ! Now I can clean up the wild hack, I did to get around watchdog in procd.
Thanx a lot.