A better practice than spawning, using, and closing multiple python multiprocessing pools?

https://stackoverflow.com/questions/23556306

18-07-2023
|

Question

I have an algorithm of the following form:

V = {}
for i in range(N): 
    V[i] = {}
    for j in range(10):
        if should_skip(i,j):
            continue
        V[i][j] = do_something(V)

which I have successfully parallelized as follows:

V = {}
for i in range(N):
    V[i] = {}
    p = Pool(4) # On each iteration, start a new pool
    results = []
    for j in range(10):
        if should_skip(i,j):
            continue
        results.append(p.apply_async(do_something, (V,)))
    p.close() 
    p.join() # Then close and join the pool after forking the jobs
    for result in results:
        r = result.get() # do_something modified to return 'j' as its second arg
        V[i][r[1]] = r[0]

I'm wondering, is this the right way to do this? Isn't it expensive to keep opening and closing pools? Is there a better way to do it that avoids having to make a new pool on each iteration?

Solution

There's no need to call close and join on the pool, since you're calling get on all the individual AsyncResult objects, which will block until the result is ready. So, you can just do:

V = {}
p = Pool(4)
for i in range(N):
    V[i] = {}
    results = []
    for j in range(10):
        if should_skip(i,j):
            continue
        results.append(p.apply_async(do_something, (V,)))
    for result in results:
        r = result.get() # This will block until `result` is available.
        V[i][r[1]] = r[0]

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow