Question

I'm trying to make a multiprocess program with Python. I have Import the multiprocess module and I try to start to process like so:

    p = Process(target=self.Parse)
    p.start()
    p.join()

In the class I have an internal thread counter and I increment the the counter every time is a process is spawned. But when I print the the thread count, the count doesn't get incremented. So then I call multiprocessing.active_children() but this returns an empty list. Does the program really not spawn the threads or processes or does it just report it? the code is as follows:

def run(self):
    if self.cont:
    while self.nxtLink or (self.thread>1):
        print(active_children())
        if self.thread<=self.count:
            p = Process(target=self.Parse)
            p.start()
            p.join()
        else:
            self.crawl(nxtLink.popleft())

The Parse function:

def Parse(self):
    self.thread+=1
    self.lock.acquire()
    next = self.nxtLink.popleft()
    self.lock.release()
    results = parser(next[0],next[1])
    #print("In Parse")
    self.broken[next[0]] = results.broken
    for i in results.foundLinks:
        if(self.thread<=self.count+5):
            p = Process(target = self.request, args = (i,next[0]))
            p.start()
            p.join()
        else:
            while (self.thread>self.count+5):
               pass   #Waits for the thread count to drop before spawning a new thread. 
            p = Process(target = self.request, args = (i,next[0]))
            p.start()
            p.join()
    self.lock.acquire()
    self.thread-=1
    self.lock.release()

Finally the request function:

def request(self, requestURL, requestingPageURL):
    # print(requestURL)
    self.lock.acquire()
    self.thread+=1
    self.lock.release()
    try:
        before = list(self.prev)
        self.lock.acquire()
        self.prev.append(requestURL)
        self.lock.release()
        if(requestURL in before):
            #print(before)
            return
        nextRequest = req.urlopen(requestURL)
        self.lock.acquire()
        self.nxtLink.append((requestURL,nextRequest))
        self.lock.release()
    except err.URLError:
        self.lock.acquire()
        try:
            self.broken[requestingPageURL].append(requestURL)
        except KeyError:
            self.broken[requestingPageURL] = [requestURL]
        self.lock.release()
    finally:
        self.lock.acquire()
        self.thread-=1
        self.lock.release()

I am really stuck on why Its not spawning processes But the program as a whole works fine so I'm a little confused.

Was it helpful?

Solution

join() waits for the process to complete. When you have a sequence like:

p = Process(target=self.Parse)
p.start()
p.join()

The parent program waits for the child to complete so you don't have active children at the point you make the check. You'd be better off just calling the functions instead of spawning children because you just wait for them to complete anyway. Its common for code like this to put Process objects in a list, do other work, and come back and join them later when the work is done.

You can add some debug code that tracks what's been called to verify that your child code is running:

import time
with open('/tmp/trace.txt', 'a') as fp:
    fp.write(time.asctime() + '\n')

Its a good idea in general to add some logging to the processes you spawn so that you can track things like python exceptions in your code.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top