Twisted: use of multiple threads and processes together

https://stackoverflow.com/questions/6052043

15-11-2019
|

Question

The Twisted documentation led me to believe that it was OK to combine techniques such as reactor.spawnProcess() and threads.deferToThread() in the same application, that the reactor would handle this elegantly under the covers. Upon actually trying it, I found that my application deadlocks. Using multiple threads by themselves, or child processes by themselves, everything is fine.

Looking into the reactor source, I find that the SelectReactor.spawnProcess() method simply calls os.fork() without any consideration for multiple threads that might be running. This explains the deadlocks, because starting with the call to os.fork() you will have two processes with multiple concurrent threads running and doing who knows what with the same file descriptors.

My question for SO is, what is the best strategy for solving this problem?

What I have in mind is to subclass SelectReactor, so that it is a singleton and calls os.fork() only once, immediately when instantiated. The child process will run in the background and act as a server for the parent (using object serialization over pipes to communicate back and forth). The parent continues to run the application and may use threads as desired. Calls to spawnProcess() in the parent will be delegated to the child process, which will be guaranteed to have only one thread running and can therefore call os.fork() safely.

Has anyone done this before? Is there a faster way?

Solution 3

Returning to this issue after some time, I found that if I do this:

reactor.callFromThread(reactor.spawnProcess, *spawnargs)

instead of this:

reactor.spawnProcess(*spawnargs)

then the problem goes away in my small test case. There is a remark in the Twisted documentation "Using Processes" that led me to try this: "Most code in Twisted is not thread-safe. For example, writing data to a transport from a protocol is not thread-safe."

I suspect that the other people Jean-Paul mentioned were having this problem may be making a similar mistake. The responsibility is on the application to enforce that reactor and other API calls are being made within the correct thread. And apparently, with very narrow exceptions, the "correct thread" is nearly always the main reactor thread.

OTHER TIPS

What is the best strategy for solving this problem?

File a ticket (perhaps after registering) describing the issue, preferably with a reproducable test case (for maximum accuracy). Then there can be some discussion about what the best way (or ways - different platforms may demand different solution) to implement it might be.

The idea of immediately creating a child process to help with further child process creation has been raised before, to solve the performance issue surrounding child process reaping. If that approach now resolves two issues, it starts to look a little more attractive. One potential difficulty with this approach is that spawnProcess synchronously returns an object which supplies the child's PID and allows signals to be sent to it. This is a little more work to implement if there is an intermediate process in the way, since the PID will need to be communicated back to the main process before spawnProcess returns. A similar challenge will be supporting the childFDs argument, since it will no longer be possible to merely inherit the file descriptors in the child process.

An alternate solution (which may be somewhat more hackish, but which may also have fewer implementation challenges) might be to call sys.setcheckinterval with a very large number before calling os.fork, and then restore the original check interval in the parent process only. This should suffice to avoid any thread switching in the process until the os.execvpe takes place, destroying all the extra threads. This isn't entirely correct, since it will leave certain resources (such as mutexes and conditions) in a bad state, but you use of these with deferToThread isn't very common so maybe that doesn't affect your case.

The advice Jean-Paul gives in his answer is good, but this should work (and does in most cases).

First, Twisted uses threads for hostname resolution as well, and I've definitely used subprocesses in Twisted processes that also make client connections. So this can work in practice.

Second, fork() does not create multiple threads in the child process. According to the standard describing fork(),

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread ...

Now, that's not to say that there are no potential multithreading issues with spawnProcess; the standard also says:

... to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called ...

and I don't think there's anything to ensure that only async-signal-safe operations are used.

So, please be more specific as to your exact problem, since it isn't a subprocess with threads being cloned.

fork() on Linux definitely leaves the child process with only one thread.

I assume you are aware that, when using threads in Twisted, the ONLY Twisted API that threads are permitted to call is callFromThread? All other Twisted APIs must only be called from the main, reactor thread.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow