I need to run the same function based on the same data a lot of times. For this I am using multiprocessing.Pool in order to speedup the computation.

from multiprocessing import Pool
import numpy as np
x=np.array([1,2,3,4,5])

def func(x): #this should be a function that takes 3 minutes 
    m=mean(x)
    return(m)

p=Pool(100)
mapper=p.map(multiple_cv,[x]*500)

The program works well but at the end I have 100 python processes opened and all my system starts to go very slow.

How can I solve this? Am

I using Pool in the wrong way? Should I use another function?

EDIT: using p = Pool(multiprocessing.cpu_count()) will my PC use 100% of it's power? Or there is something else I should use?

有帮助吗?

解决方案

In addition to limiting yourself to

p = Pool(multiprocessing.cpu_count())

I believe you want to do the following when you're finished as well...

p.close()

This should close out the process after it's completed.

其他提示

As a general rule, you don't want too many more pools than you have CPU cores, because your computer won't be able to parallelize the work beyond the number of cores available to actually do the processing. It doesn't matter if you've got 100 processes when your CPU can only process four thing simultaneously. A common practice is to do this

p = Pool(multiprocessing.cpu_count())
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top