I have an algorithm of the following form:

V = {}
for i in range(N): 
    V[i] = {}
    for j in range(10):
        if should_skip(i,j):
            continue
        V[i][j] = do_something(V)

which I have successfully parallelized as follows:

V = {}
for i in range(N):
    V[i] = {}
    p = Pool(4) # On each iteration, start a new pool
    results = []
    for j in range(10):
        if should_skip(i,j):
            continue
        results.append(p.apply_async(do_something, (V,)))
    p.close() 
    p.join() # Then close and join the pool after forking the jobs
    for result in results:
        r = result.get() # do_something modified to return 'j' as its second arg
        V[i][r[1]] = r[0]

I'm wondering, is this the right way to do this? Isn't it expensive to keep opening and closing pools? Is there a better way to do it that avoids having to make a new pool on each iteration?

有帮助吗?

解决方案

There's no need to call close and join on the pool, since you're calling get on all the individual AsyncResult objects, which will block until the result is ready. So, you can just do:

V = {}
p = Pool(4)
for i in range(N):
    V[i] = {}
    results = []
    for j in range(10):
        if should_skip(i,j):
            continue
        results.append(p.apply_async(do_something, (V,)))
    for result in results:
        r = result.get() # This will block until `result` is available.
        V[i][r[1]] = r[0]
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top