Question

This should be extremely simple to people who know this, but I am not one of them. I searched a lot about multi-processing, but it makes me even more confused...

I need to process about 160 data files independently. I have some function to deal with data, say f(arg1,arg2). My computer's CPU is i7-3770 (4 cores 8 threads). I was wondering if I can open 8 iptyhon qt consoles to run this same function (by copying the function to each console) with different values for arg1 and arg2 in the same time?

Or is there a very very simple example of do such task by using multiprocessing in python?

I know very little about coding, I am merely using pandas, numpy and scipy to process data. I am using Anaconda as my python environment.

Thank you so much for help!

Was it helpful?

Solution

The multiprocessing module is meant for this use case.

A simple complete example of its usage is:

import multiprocessing

def my_function(x):
    """The function you want to compute in parallel."""
    x += 1
    return x


if __name__ == '__main__':
    pool = multiprocessing.Pool()
    results = pool.map(my_function, [1,2,3,4,5,6])
    print(results)

The call pool.map will execute my_function with argument 1, then 2 etc but in parallel.

Note that my_function takes only one argument. If you have a function f that takes n arguments simply write a function f_helper:

def f_helper(args):
    f(*args)

And pack the arguments into a tuple. For example:

results = pool.map(f_helper, [(1,2,3), (4,5,6), (7,8,9)])

is equivalent to:

[f(1, 2, 3), f(4, 5, 6), f(7, 8, 9)]

but the calls to f are executed in parallel.


Note: since the code will run in a different process, any side-effect of f wont be preserved. For example if you modify the original argument, the main process will not see this change. You have to think that the arguments are copied and passed to the child process, which computes the result which is again copied into the main process.

If the function you are trying to compute doesn't take long enough the copying of arguments and return value can take more time then running the code serially.


The documentation contains various examples of usage of the module.

OTHER TIPS

I am running the exact same code:

import multiprocessing

def my_function(x):
    """The function you want to compute in parallel."""
    x += 1
    return x


if __name__ == '__main__':
    pool = multiprocessing.Pool()
    results = pool.map(my_function, [1,2,3,4,5,6])
    print(results)

in ipython QT console on Windows. However, just like for the poster above, the code does not work -- the QT console just freezes up.

Any solution to this?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top