I am creating a pipe using popen() and the process is invoking a third party tool which in some rare cases I need to terminate.

::popen(thirdPartyCommand.c_str(), "w");

If I just throw an exception and unwind the stack, my unwind attempts to call pclose() on the third party process whose results I no longer need. However, pclose() never returns as it blocks with the following stack trace on Centos 4:

#0  0xffffe410 in __kernel_vsyscall ()
#1  0x00807dc3 in __waitpid_nocancel () from /lib/libc.so.6
#2  0x007d0abe in _IO_proc_close@@GLIBC_2.1 () from /lib/libc.so.6
#3  0x007daf38 in _IO_new_file_close_it () from /lib/libc.so.6
#4  0x007cec6e in fclose@@GLIBC_2.1 () from /lib/libc.so.6
#5  0x007d6cfd in pclose@@GLIBC_2.1 () from /lib/libc.so.6

Is there any way to force the call to pclose() to be successful before calling it so I can programmatically avoid this situation of my process getting hung up waiting for pclose() to succeed when it never will because I've stopped supplying input to the popen()ed process and wish to throw away its work?

Should I write an end of file somehow to the popen()ed file descriptor before trying to close it?

Note that the third party software is forking itself. At the point where pclose() has hung, there are four processes, one of which is defunct:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
abc       6870  0.0  0.0   8696   972 ?        S    04:39   0:00 sh -c /usr/local/bin/third_party /home/arg1 /home/arg2 2>&1
abc       6871  0.0  0.0  10172  4296 ?        S    04:39   0:00 /usr/local/bin/third_party /home/arg1 /home/arg2
abc       6874 99.8  0.0  10180  1604 ?        R    04:39 141:44 /usr/local/bin/third_party /home/arg1 /home/arg2
abc       6875  0.0  0.0      0     0 ?        Z    04:39   0:00 [third_party] <defunct>
有帮助吗?

解决方案

I see two solutions here:

  • The neat one: you fork(), pipe() and execve() (or anything in the exec family of course...) "manually", then it is going to be up to you to decide if you want to let your children become zombies or not. (i.e. to wait() for them or not)
  • The ugly one: if you're sure you only have one of this child process running at any given time, you could use sysctl() to check if there is any process running with this name before you call pclose()... yuk.

I strongly advise the neat way here, or you could just ask whomever responsible to fix that infinite loop in your third party tool haha.

Good luck!

EDIT:

For you first question: I don't know. Doing some researches on how to find processes by name using sysctl() shoud tell you what you need to know, I myself have never pushed it this far.

For your second and third question: popen() is basically a wrapper to fork() + pipe() + dup2() + execl().

fork() duplicates the process, execl() replaces the duplicated process' image with a new one, pipe() handles inter process communication and dup2() is used to redirect the output... And then pclose() will wait() for the duplicated process to die, which is why we're here.

If you want to know more, you should check this answer where I've recently explained how to perform a simple fork with standard IPC. In this case, it's just a bit more complicated as you have to use dup2() to redirect the standard output to your pipe.

You should also take a look at popen()/pclose() source codes, as they are of course open source.

Finally, here's a brief example, I cannot make it clearer than that:

int    pipefd[2];

pipe(pipefd); 
if (fork() == 0) // I'm the child
{
    close(pipefd[0]);    // I'm not going to read from this pipe
    dup2(pipefd[1], 1);  // redirect standard output to the pipe
    close(pipefd[1]);    // it has been duplicated, close it as we don't need it anymore
    execve()/execl()/execsomething()... // execute the program you want
}
else // I'm the parent
{
    close(pipefd[1]);  // I'm not going to write to this pipe
    while (read(pipefd[0], &buf, 1) > 0) // read while EOF
        write(1, &buf, 1);
    close(pipefd[1]);  // cleaning
}

And as always, remember to read the man pages and to check all your return values.

Again, good luck!

其他提示

Another solution is to kill all your children. If you know that the only child processes you have are processes that get started when you do popen(), then it's easy enough. Otherwise you may need some more work or use the fork() + execve() combo, in which case you will know the first child's PID.

Whenever you run a child process, it's PPID (parent process ID) is your own PID. It is easy enough to read the list of currently running processes and gather those that have their PPID = getpid(). Repeat the loop looking for processes that have their PPID equal to one of your children's PID. In the end you build a whole tree of child processes.

Since you child processes may end up creating other child processes, to make it safe, you will want to block those processes by sending a SIGSTOP. That way they will stop creating new children. As far as I know, you can't prevent the SIGSTOP from doing its deed.

The process is therefore:

function kill_all_children()
{
  std::vector<pid_t> me_and_children;

  me_and_children.push_back(getpid());

  bool found_child = false;
  do
  {
    found_child = false;
    std::vector<process> processes(get_processes());
    for(auto p : processes)
    {
      // i.e. if I'm the child of any one of those processes
      if(std::find(me_and_children.begin(),
                   me_and_children.end(),
                   p.ppid()))
      {
         kill(p.pid(), SIGSTOP);
         me_and_children.push_back(p.pid());
         found_child = true;
      }
    }
  }
  while(found_child);

  for(auto c : me_and_children)
  {
    // ignore ourselves
    if(c == getpid())
    {
      continue;
    }
    kill(c, SIGTERM);
    kill(c, SIGCONT);  // make sure it continues now
  }
}

This is probably not the best way to close your pipe, though, since you probably need to let the command time to handle your data. So what you want is execute that code only after a timeout. So your regular code could look something like this:

void send_data(...)
{
  signal(SIGALRM, handle_alarm);
  f = popen("command", "w");
  // do some work...
  alarm(60);  // give it a minute
  pclose(f);
  alarm(0);   // remove alarm
}

void handle_alarm()
{
  kill_all_children();
}

-- about the alarm(60);, the location is up to you, it could also be placed before the popen() if you're afraid that the popen() or the work after it could also fail (i.e. I've had problems where the pipe fills up and I don't even reach the pclose() because then the child process loops forever.)

Note that the alarm() may not be the best idea in the world. You may prefer using a thread with a sleep made of a poll() or select() on an fd which you can wake up as required. That way the thread would call the kill_all_children() function after the sleep, but you can send it a message to wake it up early and let it know that the pclose() happened as expected.

Note: I left the implementation of the get_processes() out of this answer. You can read that from /proc or with the libprocps library. I have such an implementation in my snapwebsites project. It's called process_list. You could just reap off that class.

I'm using popen() to invoke a child process which doesn't need any stdin or stdout, it just runs for a short time to do its work, then it stops all by itself. Arguably, invoking this type of child process should rather be done with system() ? Anyway, pclose() is used afterwards to verify that the child process exited cleanly.

Under certain conditions, this child process keeps on running indefinitely. pclose() blocks forever, so then my parent process is also stuck. CPU usage runs to 100%, other executables get starved, and my whole embedded system crumbles. I came here looking for solutions.

Solution 1 by @cmc : decomposing popen() into fork(), pipe(), dup2() and execl(). It might just be a matter of personal taste, but I'm reluctant to rewrite perfectly fine system calls myself. I would just end up introducing new bugs.

Solution 2 by @cmc : verifying that the child process actually exists with sysctl(), to make sure that pclose() will return successfully. I find that this somehow sidesteps the problem from the OP @WilliamKF - there is definitely a child process, it just has become unresponsive. Forgoing the pclose() call won't solve that. [As an aside, in the 7 years since @cmc wrote this answer, sysctl() seems to have become deprecated.]

Solution 3 by @Alexis Wilke : killing the child process. I like this approach best. It basically automates what I did when I stepped in manually to resuscitate my dying embedded system. The problem with my stubborn adherence to popen(), is that I get no PID from the child process. I have been trying in vain with

waitid(P_PGID, getpgrp(), &child_info, WNOHANG);

but all I get on my Debian Linux 4.19 system is EINVAL.

So here's what I cobbled together. I'm searching for the child process by name; I can afford to take a few shortcuts, as I'm sure there will only be one process with this name. Ironically, commandline utility ps is invoked by yet another popen(). This won't win any elegance prizes, but at least my embedded system stays afloat now.

FILE* child = popen("child", "r");
if (child)
{
    int nr_loops;
    int child_pid;
    for (nr_loops=10; nr_loops; nr_loops--)
    {
        FILE* ps = popen("ps | grep child | grep -v grep | grep -v \"sh -c \" | sed \'s/^ *//\' | sed \'s/ .*$//\'", "r");
        child_pid = 0;
        int found = fscanf(ps, "%d", &child_pid);
        pclose(ps);
        if (found != 1)
            // The child process is no longer running, no risk of blocking pclose()
            break;
        syslog(LOG_WARNING, "child running PID %d", child_pid);
        usleep(1000000); // 1 second
    }
    if (!nr_loops)
    {
        // Time to kill this runaway child
        syslog(LOG_ERR, "killing PID %d", child_pid);
        kill(child_pid, SIGTERM);
    }
    pclose(child); // Even after it had to be killed
} /* if (child) */

I learned in the hard way, that I have to pair every popen() with a pclose(), otherwise I pile up the zombie processes. I find it remarkable that this is needed after a direct kill; I figure that's because according to the manpage, popen() actually launches sh -c with the child process in it, and it's this surrounding sh that becomes a zombie.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top