Domanda

I am using multiprocessing to perform a bunch of serial tasks. Those tasks are each time the same on different files located in different folders. Each tasks is made up of calls to several other modules and C++ programs. There is a high level wrapper that manages the calls to other modules/functions. At the beginning of the execution of the multiprocessing code, a list with the id and the instance of this high level class is created. Then a pool of processes execute the tasks.

It runs fine until a point when the obscure exception is raised:

Traceback (most recent call last):
  File "test_parallel.py", line 197, in <module>
    pool_outputs = pool.map(do_calculations, zip(list_instances, list_IDs), )
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 148, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 422, in get
    raise self._value
IndexError: tuple index out of range

It is usually raised when the tasks has been performed a lot of times (~100th task on 200 planned).

A shortened version of the code is:

import multiprocessing
if __name__=="__main__":
    which_subfields=range(200)
    pool_size = int( multiprocessing.cpu_count() )

    run = WrapperAroundModule.run(version = "parallel")

    if pool_size == 0: 
        pool_size=1

    list_IDs = list(which_subfields)
    lock=multiprocessing.Lock()
    runs = []

    for _ in which_subfields:
        runs.append(copy.deepcopy(run))

    pool = multiprocessing.Pool(processes=pool_size, )

    pool_outputs = pool.map(do_calculations, zip(list_instances, list_IDs), )

    pool.close()
    pool.join()

with the signature of the do_calculations function being: do_calculations((instance, id))

I made sure that the function do_calculations is thread-safe, but it didn't change the situation, then I wanted to use maxtasksperchild, but unfortunately I must use python 2.6 and the module Billard can't be install on the server (run scientific linux) I'm using. I therefore wrote a workaround: the tasks to be performed are divided by chunks of length pool_size*maxtasksperchild. The scripts executes those tasks on a pool using a similar code. Once it's finished, the pool and all the variables around it are deleted and a new pool is created with the next tasks. Sadly, the error is still raised at some point. Moreover, I made sure that the two lists passed as arguments are long enough. The function do_calculations runs smoothly on the single tasks that fail in the multiprocessing version.

Any idea on the source of this error and a possible correction ?

È stato utile?

Soluzione

raise self._value means that do_calculations raised an exception in a child process and multiprocessing reraises it for you in the main process.

To get rid of the exception, fix do_calculations() function. Wrap it with try/except and print the full traceback/locals to understand where the error is.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top