Question

I have some processes showing up as <defunct> in top (and ps). I've boiled things down from the real scripts and programs.

In my crontab:

* * * * * /tmp/launcher.sh /tmp/tester.sh

The contents of launcher.sh (which is of course marked executable):

#!/bin/bash
# the real script does a little argument processing here
"$@"

The contents of tester.sh (which is of course marked executable):

#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background

ps shows the following:

user       24257 24256  0 18:32 ?        00:00:00 [launcher.sh] <defunct>
user       24259     1  0 18:32 ?        00:00:00 sleep 27

Note that tester.sh does not appear--it has exited after launching the background job.

Why does launcher.sh stick around, marked <defunct>? It only seems to do this when launched by cron--not when I run it myself.

Additional note: launcher.sh is a common script in the system this runs on, which is not easily modified. The other things (crontab, tester.sh, even the program that I run instead of sleep) can be modiified much more easily.

Was it helpful?

Solution

Because they haven't been the subject of a wait(2) system call.

Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the wait system call because it won't have the exit status or evidence of its existence any more.

When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.

But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.


Update:   Responding to comment... Hmm. I did manage to duplicate the issue:

 PPID   PID  PGID  SESS COMMAND
    1  3562  3562  3562 cron
 3562  1629  3562  3562  \_ cron
 1629  1636  1636  1636      \_ sh <defunct>
    1  1639  1636  1636 sleep

So, what happened was, I think:

  • cron forks and cron child starts shell
  • shell (1636) starts sid and pgid 1636 and starts sleep
  • shell exits, SIGCHLD sent to cron 3562
  • signal is ignored or mishandled
  • shell turns zombie. Note that sleep is reparented to init, so when the sleep exits init will get the signal and clean up. I'm still trying to figure out when the zombie gets reaped. Probably with no active children cron 1629 figures out it can exit, at that point the zombie will be reparented to init and get reaped. So now we wonder about the missing SIGCHLD that cron should have processed.
    • It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handler during daemon_fork(), and this could interfere with signal delivery on a quick exit by intermediate 1629

      Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)

OTHER TIPS

I suspect that cron is waiting for all subprocesses in the session to terminate. See wait(2) with respect to negative pid arguments. You can see the SESS with:

ps faxo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm

Here's what I see (edited):

STAT  EUID  RUID TT       TPGID  SESS  PGRP  PPID   PID %CPU COMMAND
Ss       0     0 ?           -1  3197  3197     1  3197  0.0 cron
S        0     0 ?           -1  3197  3197  3197 18825  0.0  \_ cron
Zs    1000  1000 ?           -1 18832 18832 18825 18832  0.0      \_ sh <defunct>
S     1000  1000 ?           -1 18832 18832     1 18836  0.0 sleep

Notice that the sh and the sleep are in the same SESS.

Use the command setsid(1). Here's tester.sh:

#!/bin/bash
setsid sleep 27 # the real script launches a compiled C program in the background

Notice you don't need &, setsid puts it in the background.

to my opinion it's caused by process CROND (spawned by crond for every task) waiting for input on stdin which is piped to the stdout/stderr of the command in the crontab. This is done because cron is able to send resulting output via mail to the user.

So CROND is waiting for EOF till the user command and all it's spawned child processes have closed the pipe. If this is done CROND continues with the wait-statement and then the defunct user command disappears.

So I think you have to explicitly disconnect every spawned subprocess in your script form the pipe (e.g. by redirecting it to a file or /dev/null.

so the following line should work in crontab :

* * * * * ( /tmp/launcher.sh /tmp/tester.sh &>/dev/null & ) 

I’d recommend that you solve the problem by simply not having two separate processes: Have launcher.sh do this on its last line:

exec "$@"

This will eliminate the superfluous process.

I found this question while I was looking for a solution with a similar issue. Unfortunately answers in this question didn't solve my problem.

Killing defunct process is not an option as you need to find and kill its parent process. I ended up killing the defunct processes in the following way:

ps -ef | grep '<defunct>' | grep -v grep | awk '{print "kill -9 ",$3}' | sh

In "grep ''" you can narrow down the search to a specific defunct process you are after.

I have tested the same problem so many times. And finally I've got the solution. Just specify the '/bin/bash' before the bash script as shown below.

* * * * * /bin/bash /tmp/launcher.sh /tmp/tester.sh
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top