Question

I have the sketch of my code as follows:

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        func2(a,x)
        func2(b,x)
    return x

x = []
y = func2(c, x)

The problem, as you might have figured out from the code, is that I would like func2(b) to be computed in parallel with func2(a) whenever condition is true i.e. before b is replace by a new b from func2(a). But according to my algorithm, this clearly can not happen due to the new b's.

I do think such a problem might be perfect for parallel computing approach. But, I did not use it before and my knowledge about that is quite limited. I did try the suggestion from How to do parallel programming in Python, though. But I got the same result like the sketch above.

Was it helpful?

Solution

Caveat: Threading might not be parallel enough for you (see https://docs.python.org/2/library/threading.html note on the Global Interpreter Lock) so you might have to use the multiprocessing library instead (https://docs.python.org/2/library/multiprocessing.html).

...So I've cheated/been-lazy & used a thread/process neutral term "job". You'll need to pick either threading or multiprocessing for everywhere that I use "job".

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        a_job = None
        if (number_active_jobs() >= NUM_CPUS):
            # do a and b sequentially
            func2(a, x)
        else:
            a_job = fork_job(func2, a, x)
        func2(b,x)
        if a_job is not None:
            join(a_job)

x = []
func2(c, x)
# all results are now in x (don't need y)

...that will be best if you need a,b pairs to finish together for some reason. If you're willing to let the scheduler go nuts, you could "job" them all & then join at the end:

def func1(c):
    return a,b

def func2(c,x):
    if condition:
        a,b = func1(c)
        x.append(a,b)
        if (number_active_jobs() >= NUM_CPUS):
            # do a and b sequentially
            func2(a, x)
        else:
            all_jobs.append(fork_job(func2, a, x))
        # TODO: the same job-or-sequential for func2(b,x)

all_jobs = []
x = []
func2(c, x)
for j in all_jobs:
    join(j)
# all results are now in x (don't need y)

The NUM_CPUS check could be done with threading.activeCount() instead of a full blown worker threa pool (python - how to get the numebr of active threads started by specific class?).

But with multiprocessing you'd have more work to do with JoinableQueue and a fixed size Pool of workers

OTHER TIPS

From your explanation I have a feeling that it is not that b gets updated (which is not, as DouglasDD explained), but x. To let both recursive calls to work on a same x, you need to take some sort of a snapshot of x. The simplest way is to pass an index of a newly appended tuple, along the lines of

def func2(c, x, length):
    ...
    x.append(a, b)
    func2(a, x, length + 1)
    func2(b, x, length + 1)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top