Twisted: uso di thread multipli e processi insieme

https://stackoverflow.com/questions/6052043

15-11-2019
|

Domanda

La documentazione contorta mi ha portato a ritenere che sia stato OK combinare tecniche come reactor.spawnProcess() e threads.deferToThread() nella stessa applicazione, che il reattore gestirebbe questo elegantemente sotto le coperte. In realtà provalo, ho trovato che i miei deadlock di applicazione. Usando più fili da soli, o processi figlio da soli, tutto va bene.

Guardando nella sorgente del reattore, trovo che il metodo SelectReactor.spawnProcess() chiama semplicemente os.fork() senza alcuna considerazione per più thread che potrebbero essere in esecuzione. Questo spiega i deadlock, poiché iniziando con la chiamata a os.fork() avrete due processi con più fili concorrenti in esecuzione e facendo chi sa cosa con gli stessi descrittori di file.

La mia domanda per così è, qual è la migliore strategia per risolvere questo problema?

Quello che ho in mente è di sottoclasse SelectReactor, in modo che sia un singleton e chiama os.fork() solo una volta, immediatamente quando istanziata. Il processo figlio verrà eseguito in background e fungerà da server per il genitore (utilizzando la serializzazione dell'oggetto su tubi per comunicare avanti e indietro). Il genitore continua a eseguire l'applicazione e può utilizzare i thread come desiderato. Le chiamate al spawnProcess() nel genitore saranno delegate al processo infantile, che saranno garantite per avere solo un thread in esecuzione e può quindi chiamare os.fork() in sicurezza.

Qualcuno ha fatto questo prima? C'è un modo più veloce?

Soluzione 3

Returning to this issue after some time, I found that if I do this:

reactor.callFromThread(reactor.spawnProcess, *spawnargs)

instead of this:

reactor.spawnProcess(*spawnargs)

then the problem goes away in my small test case. There is a remark in the Twisted documentation "Using Processes" that led me to try this: "Most code in Twisted is not thread-safe. For example, writing data to a transport from a protocol is not thread-safe."

I suspect that the other people Jean-Paul mentioned were having this problem may be making a similar mistake. The responsibility is on the application to enforce that reactor and other API calls are being made within the correct thread. And apparently, with very narrow exceptions, the "correct thread" is nearly always the main reactor thread.

Altri suggerimenti

What is the best strategy for solving this problem?

File a ticket (perhaps after registering) describing the issue, preferably with a reproducable test case (for maximum accuracy). Then there can be some discussion about what the best way (or ways - different platforms may demand different solution) to implement it might be.

The idea of immediately creating a child process to help with further child process creation has been raised before, to solve the performance issue surrounding child process reaping. If that approach now resolves two issues, it starts to look a little more attractive. One potential difficulty with this approach is that spawnProcess synchronously returns an object which supplies the child's PID and allows signals to be sent to it. This is a little more work to implement if there is an intermediate process in the way, since the PID will need to be communicated back to the main process before spawnProcess returns. A similar challenge will be supporting the childFDs argument, since it will no longer be possible to merely inherit the file descriptors in the child process.

An alternate solution (which may be somewhat more hackish, but which may also have fewer implementation challenges) might be to call sys.setcheckinterval with a very large number before calling os.fork, and then restore the original check interval in the parent process only. This should suffice to avoid any thread switching in the process until the os.execvpe takes place, destroying all the extra threads. This isn't entirely correct, since it will leave certain resources (such as mutexes and conditions) in a bad state, but you use of these with deferToThread isn't very common so maybe that doesn't affect your case.

The advice Jean-Paul gives in his answer is good, but this should work (and does in most cases).

First, Twisted uses threads for hostname resolution as well, and I've definitely used subprocesses in Twisted processes that also make client connections. So this can work in practice.

Second, fork() does not create multiple threads in the child process. According to the standard describing fork(),

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread ...

Now, that's not to say that there are no potential multithreading issues with spawnProcess; the standard also says:

... to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called ...

and I don't think there's anything to ensure that only async-signal-safe operations are used.

So, please be more specific as to your exact problem, since it isn't a subprocess with threads being cloned.

fork() on Linux definitely leaves the child process with only one thread.

I assume you are aware that, when using threads in Twisted, the ONLY Twisted API that threads are permitted to call is callFromThread? All other Twisted APIs must only be called from the main, reactor thread.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow