Pregunta

I have an algorithm of the following form:

V = {}
for i in range(N): 
    V[i] = {}
    for j in range(10):
        if should_skip(i,j):
            continue
        V[i][j] = do_something(V)

which I have successfully parallelized as follows:

V = {}
for i in range(N):
    V[i] = {}
    p = Pool(4) # On each iteration, start a new pool
    results = []
    for j in range(10):
        if should_skip(i,j):
            continue
        results.append(p.apply_async(do_something, (V,)))
    p.close() 
    p.join() # Then close and join the pool after forking the jobs
    for result in results:
        r = result.get() # do_something modified to return 'j' as its second arg
        V[i][r[1]] = r[0]

I'm wondering, is this the right way to do this? Isn't it expensive to keep opening and closing pools? Is there a better way to do it that avoids having to make a new pool on each iteration?

¿Fue útil?

Solución

There's no need to call close and join on the pool, since you're calling get on all the individual AsyncResult objects, which will block until the result is ready. So, you can just do:

V = {}
p = Pool(4)
for i in range(N):
    V[i] = {}
    results = []
    for j in range(10):
        if should_skip(i,j):
            continue
        results.append(p.apply_async(do_something, (V,)))
    for result in results:
        r = result.get() # This will block until `result` is available.
        V[i][r[1]] = r[0]
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top