Domanda

I have been writing a program that spawns a child process, and calls waitpid to wait for the termination of the child process. The code is below:

  // fork & exec the child
  pid_t pid = fork();
  if (pid == -1)
    // here is error handling code that is **not** triggered

  if (!pid)
    {
      // binary_invocation is an array of the child process program and its arguments
      execv(args.binary_invocation[0], (char * const*)args.binary_invocation);
      // here is some error handling code that is **not** triggered
    }
  else
    {
      int status = 0;
      pid_t res = waitpid(pid, &status, 0);

      // here I see pid_t being a positive integer > 0
      // and status being 11, which means WIFEXITED(status) is 0.
      // this triggers a warning in my programs output.
    }

The manpage of waitpid states for WIFEXITED:

WIFEXITED(status)
    returns  true  if  the child terminated normally, that is, by calling exit(3) or
    _exit(2), or by returning from main().

Which I intepret to mean it should return an integer != 0 on success, which is not happening in the execution of my program, since I observe WIFEXITED(status) == 0

However, executing the same program from the command line results in $? == 0, and starting from gdb results in:

[Inferior 1 (process 31934) exited normally]

The program behaves normally, except for the triggered warning, which makes me think something else is going on here, that I am missing.

EDIT:
as suggested below in the comments, I checked if the child is terminated via segfault, and indeed, WIFSIGNALED(status) returns 1, and WTERMSIG(status) returns 11, which is SIGSEGV.

What I don't understand though, is why a call via execv would fail with a segfault while the same call via gdb, or a shell would succeed?

EDIT2:
The behaviour of my application heavily depends on the behaviour of the child process, in particular on a file the child writes in a function declared __attribute__ ((destructor)). After the waitpid call returns, this file exists and is generated correctly which means the segfault occurs somewhere in another destructor, or somewhere outside of my control.

È stato utile?

Soluzione

On Unix and Linux systems, the status returned from wait or waitpid (or any of the other wait variants) has this structure:

bits   meaning

0-6    signal number that caused child to exit,
       or 0177 if child stopped / continued
       or zero if child exited without a signal

 7     1 if core dumped, else 0

8-15   low 8 bits of value passed to _exit/exit or returned by main,
       or signal that caused child to stop/continue

(Note that Posix doesn't define the bits, just macros, but these are the bit definitions used by at least Linux, Mac OS X/iOS, and Solaris. Also note that waitpid only returns for stop events if you pass it the WUNTRACED flag and for continue events if you pass it the WCONTINUED flag.)

So a status of 11 means the child exited due to signal 11, which is SIGSEGV (again, not Posix but conventionally).

Either your program is passing invalid arguments to execv (which is a C library wrapper around execve or some other kernel-specific call), or the child runs differently when you execv it and when you run it from the shell or gdb.

If you are on a system that supports strace, run your (parent) program under strace -f to see whether execv is causing the signal.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top