Why does Parallel Python work the way it does?

https://stackoverflow.com/questions/4074297

28-09-2019
|

문제

In Parallel Python, why is it necessary to wrap any modules the function passed will need along with variables and namespaces in that job submission call - how necessary is it to preserve module level "global" variables? (if that's all that's going on)

submit function:

submit(self, func, args=(), depfuncs=(), modules=(), callback=None, callbackargs=(),group='default', globals=None)
    Submits function to the execution queue

    func - function to be executed
    args - tuple with arguments of the 'func'
    depfuncs - tuple with functions which might be called from 'func'
    modules - tuple with module names to import
    callback - callback function which will be called with argument 
        list equal to callbackargs+(result,) 
        as soon as calculation is done
    callbackargs - additional arguments for callback function
    group - job group, is used when wait(group) is called to wait for
    jobs in a given group to finish
    globals - dictionary from which all modules, functions and classes
    will be imported, for instance: globals=globals()

해결책

The reason that pp works the way it does, is that it makes a fresh instance of the Python interpreter for every worker, which is completely independent from anything that has run before or since. This ensures that there are no unintended side-effects, such as __future__ imports being active in the worker process. The problem with this is that it makes things way more complicated to get right, and in my experience with pp, not particularly robust. pp does try to make things a bit easier for the user, but seems to introduce more problems than it solves in its efforts to do that.

If I were to write code that was designed for use on a cluster from the start, I would probably end up using pp, but I've found that adapting existing code to work with pp is a nightmare.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow