Python multicore without communication

https://stackoverflow.com/questions/18486480

26-06-2022
|

Question

I want to call a methode getRecommendations which just pickle recommendations to a specific user to a file. I used a code from a book which works. But I saw that only one core works and I want that all my cores do the work, because this would be much faster.

Here is the method.

def getRecommendations(prefs,person,similarity=sim_pearson):
    print "working on recommendation"
    totals={}
    simSums={}
    for other in prefs:
    # don't compare me to myself
        if other==person: continue
        sim=similarity(prefs,person,other)
        # ignore scores of zero or lower
        if sim<=0: continue
        for item in prefs[other]:
            # only score movies I haven't seen yet
            if item not in prefs[person] or prefs[person][item]==0:
                # Similarity * Score
                totals.setdefault(item,0)
                totals[item]+=prefs[other][item]*sim
                # Sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+=sim
    # Create the normalized list
    rankings=[(total/simSums[item],item) for item,total in totals.items( )]
    # Return the sorted list
    rankings.sort( )
    rankings.reverse( )
    ranking_output = open("data/rankings/"+str(int(person))+".ranking.recommendations","wb")
    pickle.dump(rankings,ranking_output)
    return rankings

It is called via

for i in customerID: 
        print "working on ", int(i)
        #Make this working with multiple CPU's
        getRecommendations(pickle.load(open("data/critics.recommendations", "r")), int(i))

as you can see i try to make a recommendation to every customer. Which will be used later.

So how can i multiprocess this method? I don't get it by reading a few examples or even the documentation

La solution

You want something (roughly, untested) like:

from multiprocessing import Pool
NUMBER_OF_PROCS = 5 # some number... not necessarily the number of cores due to I/O

pool = Pool(NUMBER_OF_PROCS)

for i in customerID:
    pool.apply_async(getRecommendations, [i])

pool.close()
pool.join()

(this is assuming you only pass 'i' into getRecommendations, since the pickle.load should only be done once)

Autres conseils

The answer James gave is the correct one. I'm just going to add that, you will need to import the multiprocessing module via

from multiprocessing import Pool

And, the Pool(4) means that you want to create 4 "worker" processes that will work in parallel to perform your task.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow