Question

I have the following setup:

results = [f(args) for _ in range(10**3)]

But, f(args) takes a long time to compute. So I'd like to throw multiprocessing at it. I would like to do so by doing:

pool = mp.pool(mp.cpu_count() -1) # mp.cpu_count() -> 8
results = [pool.apply_async(f, args) for _ in range(10**3)]

Clearly, I don't have 1000 processors on my computer, so my concern:
Does the above call result in 1000 processes simultaneously competing for CPU time or 7 processes running simultaneously, iteratively computing the next f(args) when the previous call finishes?

I suppose I could do something like pool.async_map(f, (args for _ in range(10**3))) to get the same results, but the purpose of this post is to understand the behavior of pool.apply_async

Was it helpful?

Solution

You'll never have more processes running than there are workers in your pool (in your case mp.cpu_count() - 1. If you call apply_async and all the workers are busy, the task will be queued and executed as soon as a worker frees up. You can see this with a simple test program:

#!/usr/bin/python

import time
import multiprocessing as mp

def worker(chunk):
    print('working')
    time.sleep(10)
    return

def main():
    pool = mp.Pool(2)  # Only two workers
    for n in range(0, 8):
        pool.apply_async(worker, (n,))
        print("called it")
    pool.close()
    pool.join()

if __name__ == '__main__':
    main()

The output is like this:

called it
called it
called it
called it
called it
called it
called it
called it
working
working
<delay>
working
working
<delay>
working 
working
<delay>
working
working

OTHER TIPS

The number of worker processes is wholly controlled by the argument to mp.pool(). So if mp.cpu_count() returns 8 on your box, 7 worker processes will be created.

All pool methods (apply_async() among them) then use no more than that many worker processes. Under the covers, arguments are pickled in the main program and sent over an inter-process pipe to worker processes. This hidden machinery effectively creates a work queue, off of which the fixed number of worker processes pull descriptions of work to do (function name + arguments).

Other than that, it's all just magic ;-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top