Question

I have a ctypes wrapper for a library. Unfortunately, this library is not 100% reliable (occasional segfaults, etc.). Because of how it's used, I want the wrapper to be reasonably resilient to the library crashing.

The best way to do this seems to be forking a process and sending the results back from the child. I'd like to do something along these lines:

r, w = os.pipe()
pid = os.fork()

if pid == 0:
    # child
    result = ctypes_fn()
    os.write(w, pickle.dumps(result))
    os.close(w)
else:
    # parent
    os.waitpid(pid, 0)
    result = os.read(r, 524288) # can be this big
    os.close(r)

    return pickle.loads(result)

This doesn't quite work, though. The forked process hangs on the write. Am I trying to send too much at once? Is there a simpler solution to this problem?

Was it helpful?

Solution

Probably you are trying to write more data than can fit into the pipe, so it is blocking until someone comes along and reads some of that info out of there. That will never happen, because the only reader is the parent process, which you appear to have written to wait until the child terminates before it reads anything. This is what we call a deadlock.

You might consider taking out that os.waitpid call and see what happens. Another option would be to see if os.pipe has any methods that give it a bigger buffer (I don't know your environment enough to say).

OTHER TIPS

The basic problem is that there's a 64kB limit on the pipe. A few possible solutions, from the simple to the complex:

  1. Send less data. zlib.compress could help in getting under the limit.
  2. Store the actual data somewhere else (file, mmap, memcache), only using the pipe to send control information.
  3. Continue using the pipe, but chunk the output. Use two sets of pipes so the processes can talk to each other and synchronize their communication. The code is more complex, but is otherwise very effective.

One solution to the deadlock that ted.dennison mentioned is the following pseudocode:

#parent
while waitpid(pid, WNOHANG) == (0, 0):
    result = os.read(r, 1024)
    #sleep for a short time
#at this point the child process has ended 
#and you need the last bit of data from the pipe
result = os.read(r, 1024)
os.close(r)

Waitpid with the WNOHANG option causes waitpid to return immediately when the child process hasn't exited yet. In this case it returns (0,0). You'll need to make sure not to overwrite the result variable each time through the loop like the above code does.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top