Question

I'm having a weird problem with some python processes running using a watchdog process.

The watchdog process is written in python and is the parent, and has a function called start_child(name) which uses subprocess.Popen to open the child process. The Popen object is recorded so that the watchdog can monitor the process using poll() and eventually end it with terminate() when needed. If the child dies unexpectedly, the watchdog calls start_child(name) again and records the new Popen object.

There are 7 child processes, all of which are also python. If I run any of the children manually, I can send SIGTERM or SIGINT using kill and get the results I expect (the process ends).

However, when run from the watchdog process, the child will only end after the FIRST signal. When the watchdog restarts the child, the new child process no longer responds to SIGTERM or SIGINT. I have no idea what is causing this.

watchdog.py

class watchdog:
    # <snip> various init stuff

    def start(self):
        self.running = true

        kids = ['app1', 'app2', 'app3', 'app4', 'app5', 'app6', 'app7']
        self.processes = {}

        for kid in kids:
            self.start_child(kid)

        self.thread = threading.Thread(target=self._monitor)
        self.thread.start()

        while self.running:
            time.sleep(10)

    def start_child(self, name):
        try:
            proc = subprocess.Popen(name)
            self.processes[name] = proc
        except:
            print "oh no"
        else:
            print "started child ok"

    def _monitor(self):
        while self.running:
            time.sleep(1)
            if self.running:
                for kid, proc in self.processes.iteritems():
                    if proc.poll() is not None: # process ended
                        self.start_child(kid)

So what happens is watchdog.start() launches all 7 processes, and if I send any process SIGTERM, it ends, and the monitor thread starts it again. However, if I then send the new process SIGTERM, it ignores it.

I should be able to keep sending kill -15 to the restarted processes over and over again. Why do they ignore it after being restarted?

Was it helpful?

Solution

As explained here: http://blogs.gentoo.org/agaffney/2005/03/18/python_sucks , when Python creates a new thread, it blocks all signals for that thread (and for any processes that thread spawns).

I fixed this using sigprocmask, called through ctypes. This may or may not be the "correct" way to do it, but it does work.

In the child process, during __init__:

libc = ctypes.cdll.LoadLibrary("libc.so")
mask = '\x00' * 17 # 16 byte empty mask + null terminator 
libc.sigprocmask(3, mask, None) # '3' on FreeBSD is the value for SIG_SETMASK

OTHER TIPS

Wouldn't it be better to restore the default signal handlers within Python rather than via ctypes? In your child process, use the signal module:

import signal
for sig in range(1, signal.NSIG):
    try:
        signal.signal(sig, signal.SIG_DFL)
    except RuntimeError:
        pass

RuntimeError is raised when trying to set signals such as SIGKILL which can't be caught.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top