Question

I have a program that fetches info from other pages and parses them using BeautifulSoup and Twisted's getPage. Later on in the program I print info that the deferred process creates. Currently my program tries to print it before the differed returns the info. How can I make it wait?

def twisAmaz(contents): #This parses the page (amazon api xml file)
    stonesoup = BeautifulStoneSoup(contents)
    if stonesoup.find("mediumimage") == None:
       imageurl.append("/images/notfound.png")
    else:
      imageurl.append(stonesoup.find("mediumimage").url.contents[0])

    usedPdata = stonesoup.find("lowestusedprice")
    newPdata = stonesoup.find("lowestnewprice")
    titledata = stonesoup.find("title")
    reviewdata = stonesoup.find("editorialreview")

    if stonesoup.find("asin") != None:
        asin.append(stonesoup.find("asin").contents[0])
    else:
        asin.append("None")
    reactor.stop()


deferred = dict()
for tmpISBN in isbn:  #Go through ISBN numbers and get Amazon API information for each
    deferred[(tmpISBN)] = getPage(fetchInfo(tmpISBN))
    deferred[(tmpISBN)].addCallback(twisAmaz)
    reactor.run()

.....print info on each ISBN
Was it helpful?

Solution

What it seems like is you're trying to make/run multiple reactors. Everything gets attached to the same reactor. Here's how to use a DeferredList to wait for all of your callbacks to finish.

Also note that twisAmaz returns a value. That value is passed through the callbacks DeferredList and comes out as value. Since a DeferredList keeps the order of the things that are put into it, you can cross-reference the index of the results with the index of your ISBNs.

from twisted.internet import defer

def twisAmaz(contents):
    stonesoup = BeautifulStoneSoup(contents)
    ret = {}
    if stonesoup.find("mediumimage") is None:
        ret['imageurl'] = "/images/notfound.png"
    else:
        ret['imageurl'] = stonesoup.find("mediumimage").url.contents[0]
    ret['usedPdata'] = stonesoup.find("lowestusedprice")
    ret['newPdata'] = stonesoup.find("lowestnewprice")
    ret['titledata'] = stonesoup.find("title")
    ret['reviewdata'] = stonesoup.find("editorialreview")
    if stonesoup.find("asin") is not None:
        ret['asin'] = stonesoup.find("asin").contents[0]
    else:
        ret['asin'] = 'None'
    return ret

callbacks = []
for tmpISBN in isbn:  #Go through ISBN numbers and get Amazon API information for each
    callbacks.append(getPage(fetchInfo(tmpISBN)).addCallback(twisAmazon))

def printResult(result):
    for e, (success, value) in enumerate(result):
        print ('[%r]:' % isbn[e]),
        if success:
            print 'Success:', value
        else:
            print 'Failure:', value.getErrorMessage()

callbacks = defer.DeferredList(callbacks)
callbacks.addCallback(printResult)

reactor.run()

OTHER TIPS

Another cool way to do this is with @defer.inlineCallbacks. It lets you write asynchronous code like a regular sequential function: http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.defer.html#inlineCallbacks

First, you shouldn't put a reactor.stop() in your deferred method, as it kills everything.

Now, in Twisted, "Waiting" is not allowed. To print results of you callback, just add another callback after the first one.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top