How to use python multiprocessing Pool.map within loop

https://stackoverflow.com/questions/22582043

19-06-2023
|

Domanda

I am running a simulation using Runge-Kutta. At every time step two FFT of two independent variables are necessary which can be parallelized. I implemented the code like this:

from multiprocessing import Pool
import numpy as np

pool = Pool(processes=2)    # I like to calculate only 2 FFTs parallel 
                            # in every time step, therefor 2 processes

def Splitter(args):
    '''I have to pass 2 arguments'''
    return makeSomething(*args):

def makeSomething(a,b):
    '''dummy function instead of the one with the FFT'''
    return a*b

def RungeK():
    # ...
    # a lot of code which create the vectors A and B and calculates 
    # one Kunge-Kutta step for them 
    # ...

    n = 20                         # Just something for the example
    A = np.arange(50000)
    B = np.ones_like(A)

    for i in xrange(n):                  # loop over the time steps
        A *= np.mean(B)*B - A
        B *= np.sqrt(A)
        results = pool.map(Splitter,[(A,3),(B,2)])
        A = results[0]
        B = results[1]

    print np.mean(A)                                 # Some output
    print np.max(B)

if __name__== '__main__':
    RungeK()

Unfortunately python generates a unlimited number of processes after reaching the loop. Before it seems that only two processes are running. Also my memory fills up. Adding a

pool.close()
pool.join()

behind the loop does not solve my problem, and to put it inside the loop makes no sense for me. Hope you can help.

Soluzione

Move the creation of the pool into the RungeK function;

def RungeK():
    # ...
    # a lot of code which create the vectors A and B and calculates
    # one Kunge-Kutta step for them
    # ...

    pool = Pool(processes=2)
    n = 20                         # Just something for the example
    A = np.arange(50000)
    B = np.ones_like(A)

    for i in xrange(n):  # loop over the time steps
        A *= np.mean(B)*B - A
        B *= np.sqrt(A)
        results = pool.map(Splitter, [(A, 3), (B, 2)])
        A = results[0]
        B = results[1]
    pool.close()
    print np.mean(A)  # Some output
    print np.max(B)

Alternatively, put it in the main block.

This is probably a side effect of how multiprocessing works. E.g. on MS windows, you need to be able to import the main module without side effects (like creating new processes).

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow