Creating and reusing objects in python processes

Question 1

I found a simple way of solving my own problem without bringing in any tools besides the standard library, I thought I'd write it down here in case somebody else has a similar problem.

multiprocessing.Pool accepts an initializer function (with arguments) that gets run when each process is launched. The return value of this function is not stored anywhere, but one can take advantage of the function to set up a global variable:

def init_process():
    global my_object
    my_object = create_costly_object()

def costly_function(task):
    global my_object
    solution = solve_task_using_my_object
    return solution

def psolve_problem():
    pool = multiprocessing.Pool(initializer=init_process)
    tasks = get_list_of_tasks()
    all_solutions = pool.map_async(costly_function, tasks)
    return all_solutions.get()

Since each process has a separate global namespace, the instantiated objects do not clash, and they are created only once per process.

Probably not the most elegant solution, but it's simple enough and gives me a near-linear speedup.

Question 2

you can have celery project handle all this for you, among many other features it also have a way to run some task initialization that can be used latter by all tasks

Question 3

You are right that you are constrained to pickable objects when using multiprocessing. Are you absolutely sure that your object is un-pickleable?

Have you tried dill? If you import it in, anytime pickle is called it will use the dill bindings. It worked for me, when I was trying to use multiprocessing on sympy equations.