multiprocessing: using pool inside imported function

https://stackoverflow.com/questions/21778026

11-10-2022
|

Pregunta

I am trying to create a script where it calls a function from a seperate module to do parallel processing.

My 'top-level' script looks like this:

from hydrology import model, descriptors
if __name__ == "__main__":
   datafile = r"C:\folder\datafile.shp"
   myModel = model.Model(data = datafile)

   res = descriptors.watershed_pll(myModel)

The descriptors module looks like this:

from multiprocessing import Pool
from arcfunc import multi_watershed

def watershed_pll(model):
    pool = Pool()
    for key, val in model.stations.iteritems():
        res = pool.apply_async(multi_watershed(val, key))
    pool.close()
    pool.join()
    return res

As you can see, the function to run in parallel is imported from the module arcfunc, the function carrying out the parallelisation is inside the module descriptors and the script running everything is seperate again.

There are no exceptions when I run, but I have two problems:

res.successful() returns False
It runs no quicker than without multiprocessing.

I suspect that my architecture is complicating things, however, it is important that the parallelisation function is in a separate module.

Any suggestions?

Solución

Instead of passing the function and argument to apply_async, the code calls multi_watershed directly (in main process), and pass the return value of the function.

Pass the function and arguments.

Replace following line:

res = pool.apply_async(multi_watershed(val, key))

with:

res = pool.apply_async(multi_watershed, (val, key))

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow