Question

# data is a list  

Threading_list=[]

class myfunction(threading.Thread):

    def __init__(self,val):
        .......
    .......

     def run(self):
        .......
        ....... 

for i in range(100000):

    t=myfunction(data[i]) # need to execute this function on every datapoint 
    t.start()
    Threading_list.append(t)

for t in Threading_list:
    t.join()

This will create around 100000 threads, but i am allowed to create a maximum of 32 threads ? What modifications can be done in this code ?

Was it helpful?

Solution

So many Python threads rarely need to be created. Even more, I hardly can imagine a reason for that. There are suitable architectirual patterns to solve the tasks of creating code executing in parallel that limit the number of threads. One of them is reactor.

What are you trying to do?

And remeber that, due to GIL, Python threads do not give any performance boost for computational tasks, even on multiprocessor and multiple kernel systems (BTW, can there be a 100000-kernel system? I doubt. :)). The only chance for boost is if the computational part is performed inside modules written in C/C++ that do their work without acquiring GIL. Usually Python threads are used to parallel the execution of code that contains blocking I/O operations.

UPD: Noticed the stackless-python tag. AFAIK, it supports microthreads. However, it's still unclear what are you trying to do.

And if you are trying just to process 100000 values (apply a formula to each of them?), it's better to write something like:

def myfunction(val):
    ....
    return something_calculated_from_val

results = [myfunction(d) for d in data] # you may use "map(myfunction, data)" instead

It should be much better, unless myfunction() performs some blocking I/O. If it does, ThreadPoolExecutor may really help.

OTHER TIPS

Here is an example that will compute squares of a list of any length, using 32 threads through a ThreadPoolExecutor. As Ellioh said, you may not want to use threads in some cases, so you can easily switch to ProcessPoolExecutor.

import concurrent.futures

def my_function(x):
    return 2**x

data = [1, 6, 9, 3, 8, 4, 213, 534]

with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    result = list(executor.map(my_function, data))

print(result)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top