Using threading/multiprocessing in python to do multiple calculations at the same time

Question 1

multiprocessing.pool and pool.map are your best friends here. It saves a lot of headache as it hides all the other complex queues and whatnot you need to make it work. All you need to do is set up the pool, assign it the max number of processes, point it to the function and iterable. See working code below.

Because of the join and the usage cases pool.map was intended to work, the program will wait until ALL processes have returned something before giving you the result.

from multiprocessing.pool import Pool

def calcNum(n):#some arbitrary, time-consuming calculation on a number
  print "Calcs Started on ", n
  m = n
  for i in range(5000000):
    m += i%25
    if m > n*n:
      m /= 2
  return m

if __name__ == "__main__":
  p = Pool(processes=3)

  nums = [12,25,76,38,8,2,5]
  finList = []


  result = p.map(calcNum, nums)
  p.close()
  p.join()

  print result

That will get you something like this:

Calcs Started on  12
Calcs Started on  25
Calcs Started on  76
Calcs Started on  38
Calcs Started on  8
Calcs Started on  2
Calcs Started on  5
[72, 562, 5123, 1270, 43, 23, 23]

Regardless of when each process is started or when it completes, map waits for each to finish and then puts them all back in the correct order (corresponding to the input iterable).

As @Guy mentioned, the GIL hurts us here. You can change the Pool to ThreadPool in the code above and see how it affects the timing of the calculations. Since the same function is used, the GIL only allows one thread to use the calcNum function at a time. So it near enough still runs serially. Multirocessing with a process or pool essentially starts further instances of your script which gets around the issue of the GIL. If you watch your running processes during the above, you'll see extra instances of 'python.exe' start while the pool is running. In this case, you'll see a total of 4.

Question 2

I guess you are affected by python Global Interpreter Lock

The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations.

try to use multiprocessing instead

from multiprocessing import Pool