How to convert Python threading code to multiprocessing code?
-
28-09-2019 - |
문제
I need to convert a threading
application to a multiprocessing
application for multiple reasons (GIL, memory leaks). Fortunately the threads are quite isolated and only communicate via Queue.Queue
s. This primitive is also available in multiprocessing
so everything looks fine. Now before I enter this minefield I'd like to get some advice on the upcoming problems:
- How to ensure that my objects can be transfered via the
Queue
? Do I need to provide some__setstate__
? - Can I rely on
put
returning instantly (like withthreading
Queue
s)? - General hints/tips?
- Anything worthwhile to read apart from the Python documentation?
해결책
Answer to part 1:
Everything that has to pass through a multiprocessing.Queue
(or Pipe
or whatever) has to be picklable. This includes basic types such as tuple
s, list
s and dict
s. Also classes are supported if they are top-level and not too complicated (check the details). Trying to pass lambda
s around will fail however.
Answer to part 2:
A put
consists of two parts: It takes a semaphore to modify the queue and it optionally starts a feeder thread. So if no other Process
tries to put
to the same Queue
at the same time (for instance because there is only one Process
writing to it), it should be fast. For me it turned out to be fast enough for all practical purposes.
Partial answer to part 3:
- The plain
multiprocessing.queue.Queue
lacks atask_done
method, so it cannot be used as a drop-in replacement directly. (A subclass provides the method.) - The old
processing.queue.Queue
lacks aqsize
method and the newermultiprocessing
version is inaccurate (just keep this in mind). - Since filedescriptors normally inherited on
fork
, care needs to be taken about closing them in the right processes.