Question

I have a Python script that starts a daemon process. I was able to do this by using the code found at: https://gist.github.com/marazmiki/3618191.

The code starts the daemon process exactly as expected. However, sometimes, and only sometimes, when the daemon process is stopped, the running job is zombied.

The stop function of the code is:

    def stop(self):
        """
            Stop the daemon
        """
        # Get the pid from the pidfile
        try:
            pf = file(self.pidfile, 'r')
            pid = int(pf.read().strip())
            pf.close()
        except:
            pid = None

        if not pid:
            message = "pidfile %s does not exist. Daemon not running?\n"
            sys.stderr.write(message % self.pidfile)
            return # not an error in a restart

        # Try killing the daemon process
        try:
            while 1:
                os.kill(pid, SIGTERM)
                time.sleep(1.0)
        except OSError, err:
            err = str(err)
            if err.find("No such process") > 0:
                if os.path.exists(self.pidfile):
                    os.remove(self.pidfile)
            else:
                print str(err)
                sys.exit(1)

When this stop() method is run, the process (pid) appears to hang, and when I Control+C out, I see the script is KeyboardInterrupted on the line time.sleep(1.0), which leads me to believe that the line:

os.kill(pid, SIGTERM)

is the offending code.

Does anyone have any idea why this could be happening? Why would this os.kill() would force a process to become a zombie?

I am running this on Ubuntu linux (if it matters).

UPDATE: I'm including my start() method per @paulus's answer.

    def start(self):
        """
            Start the daemon
        """
        pid = None
        # Check for a pidfile to see if the daemon already runs
        try:
            pf = file(self.pidfile, 'r')
            pid = int(pf.read().strip())
            pf.close()
        except:
            pid = None

        if pid:
            message = "pidfile %s already exist. Daemon already running?\n"
            sys.stderr.write(message % self.pidfile)
            sys.exit(1)

        # Start the daemon
        self.daemonize()
        self.run()

UPDATE 2: And here is the daemonize() method:

def daemonize(self):
        """
            do the UNIX double-fork magic, see Stevens' "Advanced
            Programming in the UNIX Environment" for details (ISBN 0201563177)
            http://www.erlenstar.demon.co.uk/unix/faq_2.html#SEC16
        """
        try:
            pid = os.fork()
            if pid > 0:
                # exit first parent
                sys.exit(0)
        except OSError, e:
            sys.stderr.write("fork #1 failed: %d (%s)\n" % (e.errno, e.strerror))
            sys.exit(1)

        # decouple from parent environment
        os.chdir("/")
        os.setsid()
        os.umask(0)

        # do second fork
        try:
            pid = os.fork()
            if pid > 0:
                # exit from second parent
                sys.exit(0)
        except OSError, e:
            sys.stderr.write("fork #2 failed: %d (%s)\n" % (e.errno, e.strerror))
            sys.exit(1)

        # redirect standard file descriptors
        sys.stdout.flush()
        sys.stderr.flush()

        sys.stdout = file(self.stdout, 'a+', 0)
        si = file(self.stdin, 'r')
        so = file(self.stdout, 'a+')
        se = file(self.stderr, 'a+', 0)
        os.dup2(si.fileno(), sys.stdin.fileno())
        os.dup2(so.fileno(), sys.stdout.fileno())
        os.dup2(se.fileno(), sys.stderr.fileno())

        # write pidfile
        atexit.register(self.delpid)
        pid = str(os.getpid())
        file(self.pidfile, 'w+').write("%s\n" % pid)
Was it helpful?

Solution

You're looking in the wrong direction. The flawed code is not the one in the stop routine but it is in the start one (if you're using the code from gist). Double fork is a correct method, but the first fork should wait for the child process, not simply quit.

The correct sequence of commands (and the reasons to do the double fork) can be found here: http://lubutu.com/code/spawning-in-unix (see the "Double fork" section).

The sometimes you mention is happening when the first parent dies before getting SIGCHLD and it doesn't get to init.

As far as I remember, init should periodically read exit codes from it's children besides signal handling, but the upstart version simply relies on the latter (therefore the problem, see the comment on the similar bug: https://bugs.launchpad.net/upstart/+bug/406397/comments/2).

So the solution is to rewrite the first fork to actually wait for the child.

Update: Okay, you want some code. Here it goes: pastebin.com/W6LdjMEz I've updated the daemonize, fork and start methods.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top