Question

I am writing a system monitor for Linux and want to include some watchdog functionality. In the kernel, you can configure the watchdog to keep going even if /dev/watchdog is closed. In other words, if my daemon exits normally and closes /dev/watchdog, the system would still re-boot 59 seconds later. That may or may not be desirable behavior for the user.

I need to make my daemon aware of this setting because it will influence how I handle SIGINT. If the setting is on, my daemon would need to (preferably) start an orderly shutdown on exit or (at least) warn the user that the system is going to reboot shortly.

Does anyone know of a method to obtain this setting from user space? I don't see anything in sysconf() to get the value. Likewise, I need to be able to tell if the software watchdog is enabled to begin with.

Edit:

Linux provides a very simple watchdog interface. A process can open /dev/watchdog , once the device is opened, the kernel will begin a 60 second count down to reboot unless some data is written to that file, in which case the clock re-sets.

Depending on how the kernel is configured, closing that file may or may not stop the countdown. From the documentation:

The watchdog can be stopped without causing a reboot if the device /dev/watchdog is closed correctly, unless your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.

I need to be able to tell if CONFIG_WATCHDOG_NOWAYOUT was set from within a user space daemon, so that I can handle the shutdown of said daemon differently. In other words, if that setting is high, a simple:

# /etc/init.d/mydaemon stop

... would reboot the system in 59 seconds, because nothing is writing to /dev/watchdog any longer. So, if its set high, my handler for SIGINT needs to do additional things (i.e. warn the user at the least).

I can not find a way of obtaining this setting from user space :( Any help is appreciated.

Was it helpful?

Solution

AHA! After digging through the kernel's linux/watchdog.h and drivers/watchdog/softdog.c, I was able to determine the capabilities of the softdog ioctl() interface. Looking at the capabilities that it announces in struct watchdog_info:

static struct watchdog_info ident = {
                .options =              WDIOF_SETTIMEOUT |
                                        WDIOF_KEEPALIVEPING |
                                        WDIOF_MAGICCLOSE,
                .firmware_version =     0,
                .identity =             "Software Watchdog",
        };

It does support a magic close that (seems to) override CONFIG_WATCHDOG_NOWAYOUT. So, when terminating normally, I have to write a single char 'V' to /dev/watchdog then close it, and the timer will stop counting.

A simple ioctl() on a file descriptor to /dev/watchdog asking WDIOC_GETSUPPORT allows one to determine if this flag is set. Pseudo code:

int fd;
struct watchdog_info info;

fd = open("/dev/watchdog", O_WRONLY);
if (fd == -1) {
   perror("open");
   // abort, timer did not start - no additional concerns
}

if (ioctl(fd, WDIOC_GETSUPPORT, &info)) {
    perror("ioctl");
    // abort, but you probably started the timer! See below.
}

if (WDIOF_MAGICCLOSE & info.options) {
   printf("Watchdog supports magic close char\n");
   // You have started the timer here! Handle that appropriately.
}

When working with hardware watchdogs, you might want to open with O_NONBLOCK so ioctl() not open() blocks (hence detecting a busy card).

If WDIOF_MAGICCLOSE is not supported, one should just assume that the soft watchdog is configured with NOWAYOUT. Remember, just opening the device successfully starts the countdown. If all you're doing is probing to see if it supports magic close and it does, then magic close it. Otherwise, be sure to deal with the fact that you now have a running watchdog.

Unfortunately, there's no real way to know for sure without actually starting it, at least not that I could find.

OTHER TIPS

a watchdog guards against hard-locking the system, either because of a software crash, or hardware failure.

what you need is a daemon monitoring daemon (dmd). check 'monit'

I think the watchdog device drivers are really intended for use on embedded platforms (or at least well controlled ones) where the developers will have control of which kernel is in use.

This could be considered to be an oversight, but I think it is not.

One other thing you could try, if the watchdog was built as a loadable module, unloading it will presumably abort the shutdown?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top