I am using multiprocessing to perform a bunch of serial tasks. Those tasks are each time the same on different files located in different folders. Each tasks is made up of calls to several other modules and C++ programs. There is a high level wrapper that manages the calls to other modules/functions. At the beginning of the execution of the multiprocessing code, a list with the id and the instance of this high level class is created. Then a pool of processes execute the tasks.
It runs fine until a point when the obscure exception is raised:
Traceback (most recent call last):
File "test_parallel.py", line 197, in <module>
pool_outputs = pool.map(do_calculations, zip(list_instances, list_IDs), )
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 148, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib64/python2.6/multiprocessing/pool.py", line 422, in get
raise self._value
IndexError: tuple index out of range
It is usually raised when the tasks has been performed a lot of times (~100th task on 200 planned).
A shortened version of the code is:
import multiprocessing
if __name__=="__main__":
which_subfields=range(200)
pool_size = int( multiprocessing.cpu_count() )
run = WrapperAroundModule.run(version = "parallel")
if pool_size == 0:
pool_size=1
list_IDs = list(which_subfields)
lock=multiprocessing.Lock()
runs = []
for _ in which_subfields:
runs.append(copy.deepcopy(run))
pool = multiprocessing.Pool(processes=pool_size, )
pool_outputs = pool.map(do_calculations, zip(list_instances, list_IDs), )
pool.close()
pool.join()
with the signature of the do_calculations
function being: do_calculations((instance, id))
I made sure that the function do_calculations
is thread-safe, but it didn't change the situation, then I wanted to use maxtasksperchild
, but unfortunately I must use python 2.6 and the module Billard can't be install on the server (run scientific linux) I'm using. I therefore wrote a workaround: the tasks to be performed are divided by chunks of length pool_size*maxtasksperchild
. The scripts executes those tasks on a pool using a similar code. Once it's finished, the pool and all the variables around it are deleted and a new pool is created with the next tasks. Sadly, the error is still raised at some point. Moreover, I made sure that the two lists passed as arguments are long enough. The function do_calculations
runs smoothly on the single tasks that fail in the multiprocessing version.
Any idea on the source of this error and a possible correction ?