Pergunta

I'm wanting to achieve multithreading in python where the threaded function does some actions and adds a URL to a list of URLs (links) and a listener watches the links list from the calling script for new elements to iterate over. Confused? Me too, I'm not even sure how to go about explaining this, so let me try to demonstrate with pseudo-code:

from multiprocessing import Pool

def worker(links):
    #do lots of things with urllib2 including finding elements with BeautifulSoup
    #extracting text from those elements and using it to compile the unique URL

    #finally, append a url that was gathered in the `lots of things` section to a list
    links.append( `http://myUniqueURL.com` ) #this will be unique for each time `worker` is called

links = []
for i in MyBigListOfJunk:
    Pool().apply(worker, links)

for link in links:
    #do a bunch of stuff with this link including using it to retrieve the html source with urllib2    

Now, rather than waiting for all the worker threads to finish and iterate over links all at once, is there a way for me to iterate over the URLs as they are getting appended to the links list? Basically, the worker iteration to generate the links list HAS to be separate from the iteration of links itself; however, rather than running each sequentially I was hoping I could run them somewhat concurrently and save some time... currently I must call worker upwards of 30-40 times within a loop and the entire script takes roughly 20 minutes to finish executing...

Any thoughts would be very welcome, thank you.

Foi útil?

Solução

You should use Queue class for this. It is a thread-safe array. It's 'get' function removes item from Queue, and, what's important, blocks when there is no items and waits until other processes add them. If you use multiprocessing than you should use Queue from this module, not the Queue module. Next time you ask questions on processes, provide exact Python version you want it for. This is for 2.6

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top