Question

Various Perl scripts (Server Side Includes) are calling a Perl module with many functions on a website. EDIT: The scripts are using use lib to reference the libraries from a folder. During busy periods the scripts (not the libraries) become zombies and overload the server.

The server lists:

319 ?        Z      0:00 [scriptname1.pl] <defunct>    
320 ?        Z      0:00 [scriptname2.pl] <defunct>    
321 ?        Z      0:00 [scriptname3.pl] <defunct>

I have hundreds of instances of each.

EDIT: We are not using fork, system or exec, apart form the SSI directive

<!--#exec cgi="/cgi-bin/scriptname.pl"-->

As far as I know, in this case httpd itself will be the owner of the process. MaxRequestPerChild is set to 0 which should not let the parents die before the child process is finished.

So far we figured that temporarily suspending some of the scripts help the server coping with the defunct processes and prevent it from falling over however zombie processes are still forming without a doubt. Apparently gbacon seems to be the closest to the truth with his theory that the server is not being able to cope with the load.

What could lead to httpd abandoning these processes? Is there any best practice to prevent these from happening?

Thanks

Answer: The point goes to Rob. As he says, CGI scripts that generate SSI's will not have those SSI's handled. The evaluation of SSI's happens before the running of CGI's in the Apache 1.3 request cycle. This was fixed with Apache 2.0 and later so that CGI's can generate SSI commands.

Since we were running on Apache 1.3, for every page view the SSI's turned into defunct processes. Although the server was trying to clear them it was way too busy with the running tasks to be able to succeed. As a result, the server fell over and become unresponsive. As a short term solution we reviewed all SSI's and moved some of the processes to client side to free up server resources and give it time to clean up. Later we upgraded to Apache 2.2.

Was it helpful?

Solution

I just saw your comment that you are running Apache 1.3 and that may be associated with your problem.

SSI's can run CGI's. But CGI scripts that generate SSI's will not have those SSI's handled. The evaluation of SSI's happens before the running of CGI's in the Apache 1.3 request cycle. This was fixed with Apache 2.0 and later so that CGI's can generate SSI commands.

As I'd suggested above, try running your scripts on their own and have a look at the output. Are they generating SSI's?

Edit: Have you tried launching a trivial Perl CGI script to simply printout a Hello World type HTTP response?

Then if this works add a trivial SSI directives such as

<!--#printenv -->

and see what happens.

Edit 2: Just realised what is probably happening. Zombies occur when a child process exits and isn't reaped. These processes are hanging around and slowly using up resources within the process table. A process without a parent is an orphaned process.

Are you forking off processes within your Perl script? If so, have you added a waitpid() call to the parent?

Have you also got the correct exit within the script?

CORE::exit(0);

OTHER TIPS

More Band-Aid than best practice, but sometimes you can get away with simple

$SIG{CHLD} = "IGNORE";

According to the perlipc documentation

On most Unix platforms, the CHLD (sometimes also known as CLD) signal has special behavior with respect to a value of 'IGNORE'. Setting $SIG{CHLD} to 'IGNORE' on such a platform has the effect of not creating zombie processes when the parent process fails to wait() on its child processes (i.e., child processes are automatically reaped). Calling wait() with $SIG{CHLD} set to 'IGNORE' usually returns -1 on such platforms.

If you care about the exit statuses of child processes, you need to collect them (commonly referred to as "reaping") by calling wait or waitpid. Despite the creepy name, a zombie is merely a child process that has exited but whose status has not yet been reaped.

If your Perl programs themselves are the child processes becoming zombies, that means their parents (the ones that are forking-and-forgetting your code) need to clean up after themselves. A process cannot stop itself from becoming a zombie.

As you have all the bits yourself, I'd suggest running the individual scripts one at a time from the command line to see if you can spot the ones that are hanging.

Does a ps listing show an inordinate number of instances of one particular script running?

Are you running the CGI's using mod_perl?

Edit: Just saw your comments regarding SSI's. Don't forget that SSI directives can run Perl scripts themselves. Have a look to see what the CGI's are trying to run?

Are they dependent on yet another server or service?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top