Question

Can anybody tell me how to handle the software watchdog in Linux?

I have a program SampleApplication which runs continuously and I need to restart it if its hangs or closes unexpectedly.

I was Googling about this and found Linux has watchdog at /dev/watchdog but don’t know how to use it. Could someone help me with an example?

My question is where to I specify my application name and delay interval to restart?

Was it helpful?

Solution

Most of the Unix/Linux init programs will manage daemons for you and restart them. Look into placing your service in /etc/inittab. Or you might be using Upstart or systemd.

All of these programs run as PID 1 and it is their job to monitor and restart system processes.

From your Busybox tag I would assume you are running an embedded system. On those, the System V style init scripts with all of their shell scripts are really overkill. You should probably rip all that out and replace it with entries in /etc/inittab or upstart or systemd jobs.

OTHER TIPS

Since the moderators ignore post-improvements now i'll have to post it seperately

The linux software watchdog will reboot the machine, not just restart your process.

Well this is simply not true, it is very possible to restart single or multiple processes after the watchdog signals that the systems is hanging - you can even ABORT the reboot or do a SOFT-reboot, one is able to configure "test" and "repair"-scripts / binaries which do whatever you want them to do. The busybox-version of watchdog is stripped down to a near-unusable level ... i guess the world will never know why the busybox-devs decided to abandon primary functionalities - for now, it would be best to avoid busybox at all --> the speed-improvements are nearly inexistent, the size-decrease does not compensate the huge loss of functionality. /bin/bash is rather small - recompile everything with the flag "-Os" if size matters and you're good to go - an out-of-the-box watchdog which allows for just about everything one could want.

Oh and PLEASE do NOT create your own watchdog - that'll most likely leave you with unhandled errors and make your life bad one day.

How about using cron? Set up a small cron job that runs every minute. Check if your application is up (using ps) and if not, restart it.

Make a tiny script like this:

#!/bin/bash
if [ ! "$(pidof myapp)" ] 
then
  /path/to/myapp &
fi

You test if "myapp" is in the process list. "!" reverses the test. If it's not there, it runs "myapp". "&" is just so it starts in the background.

Add this to cron. Depending on your system and preferences there's several ways to do it. The classical one is to use crontab. There's lots of documentation on how to specify your crontab line, but you probably want something like this:

* * * * * /path/to/the/script.sh > /dev/null

This will run your test every minute of every hour of every… You get the idea.

Use /etc/inittab you can utilize it to start in the specific run levels and if it is killed it shall be restarted automatically

n:2345:respawn:/path/to/app

This will make it respawn in run levels 2345 you probably only need 3 and 5 but this will work fine and is built into Linux.

Documentation for the watchdog is here: http://linux.die.net/man/8/watchdog

But it sounds like this is not what you want. The linux software watchdog will reboot the machine, not just restart your process.

You can easily make your own watchdog. For example, you could have your program periodically write some temp file, and launch a script that checks the file once in a while and restarts your process if it hasn't updated for some time.

If you are using systemd there are 2 watchdogs: one for hardware (using systemd.conf or using a watchdog daemon) and one for daemons initialized as services. If systemd is your option have a look at the following: http://0pointer.de/blog/projects/watchdog.html

You can use "Monit" utility to restart and monitor your services. Simply install by issuing command `"apt-get install monit".

If anyone has arrived at this page looking for an operating system watchdog (which is not directly what the OP wanted), this is what you need:

sudo apt-get install watchdog
service watchdog status
service watchdog start 

To check that it's working execute:

tail -f /var/log/syslog | grep watchdog

You should see something like:

Jul 25 22:03:35 nuc watchdog[14229]: still alive after 733 interval(s)
Jul 25 22:03:36 nuc watchdog[14229]: still alive after 734 interval(s)
Jul 25 22:03:36 nuc watchdog[14229]: still alive after 735 interval(s)
Jul 25 22:03:37 nuc watchdog[14229]: still alive after 736 interval(s)
Jul 25 22:03:37 nuc watchdog[14229]: still alive after 737 interval(s)

I hope I'm answering the question correctly. All the other answers seem to be very different.

You can try wdog which is a utility written in c++ and linking against the Kahless_9 framework. The source code for this can be downloaded from: https://github.com/zepher999/wdog and consequently updated to suite your own needs. There is still some TODO list changes required for the future but as is this should cater for your current requirements.

The utility requires a csv file as input wherein all processes to be watched with their arguments are contained. Upon startup, the utility starts all of these processes designated in the csv file and monitors them for exit/termination whereupon it restarts the process.

Currently wdog allows for the stopping/killing of monitored processes as well as the ability to startup the utility in hot or cold mode. Hot mode allows the utility to use cached records to monitor already started processes while cold mode starting discards such cached values thereby attempting to start all processes.

The utility also has the ability to launch an instance of itself to monitor itself, thereby having a watchdog for the watchdog.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top