Question

I have a Google AppEngine (in Python) application where I need to perform 4 to 5 url fetches, and then combine the data before I print it out to the response.

I can do this without any problems using a synchronous workflow, but since the urls that I am fetching are not related or dependent on each other, performing this asynchronously would be the most ideal (and quickest).

I have read and re-read the documentation here, but I just can't figure out how to get read the contents for each url. I've also searched the web for a small example (which is really what I am in need of). I have seen this SO question, but again, here they don't mention anything about reading the contents of these individual asynchronous url fetches.

Does anyone have any simple examples of how to perform 4 or 5 asynchronous url fetches with AppEngine? And then combine the results before I print it to the response?

Here is what I have so far:

rpcs = []
for album in result_object['data']:
  total_facebook_photo_count = total_facebook_photo_count + album['count']
  facebook_albumid_array.append(album['id'])

  #Get the photos in the photo album
  facebook_photos_url = 'https://graph.facebook.com/%s/photos?access_token=%s&limit=1000' % (album['id'], access_token)

  rpc = urlfetch.create_rpc()
  urlfetch.make_fetch_call(rpc, facebook_photos_url)
  rpcs.append(rpc)

for rpc in rpcs:
  result = rpc.get_result()
  self.response.out.write(result.content)

However, it still looks like the line: result = rpc.get_result() is forcing it to wait for the first request to finish, then the second, then the third, and so forth. Is there a way to simply put the results in a variables as they are received?

Thanks!

Was it helpful?

Solution

In the example, text = result.content is where you get the content (body).

To do url fetches in parallell, you could set them up, add to a list and check results afterwards. Expanding on the example already mentioned, it could look something like:

from google.appengine.api import urlfetch

futures = []
for url in urls:
    rpc = urlfetch.create_rpc()
    urlfetch.make_fetch_call(rpc, url)
    futures.append(rpc)

contents = []
for rpc in futures:
    try:
        result = rpc.get_result()
        if result.status_code == 200:
            contents.append(result.content)
            # ...
    except urlfetch.DownloadError:
        # Request timed out or failed.
        # ...

concatenated_result = '\n'.join(contents)

In this example, we assemble the body of all the requests that returned status code 200, and concatenate with linebreak between them.

Or with ndb, my personal preference for anything async on GAE, something like:

@ndb.tasklet
def get_urls(urls):
  ctx = ndb.get_context()
  result = yield map(ctx.urlfetch, urls)
  contents = [r.content for r in result if r.status_code==200]
  raise ndb.Return('\n'.join(contents))

OTHER TIPS

I use this code (implmented before I learned about ndb tasklets):

    while rpcs:
      rpc = UserRPC.wait_any(rpcs)
      result = rpc.get_result()
      # process result here
      rpcs.remove(rpc)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top