Question

I am trying to load a url and I get this error:

DownloadError: ApplicationError: 2 Too many repeated redirects

This is the code I am using:

  headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; de-at) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1' }    
  url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
  cmcHTM = urlfetch.fetch(url=url)
  cmcHTML = str(cmcHTM.content)

I check the redirections of this website at: http://www.internetofficer.com/seo-tool/redirect-check/ and I found that this site is redirected to itself! So url fetch seems to be going in circles trying to load this page. Meanwhile, this page loads just fine in my browser.

So I tried using this code:

  cmcHTM = urlfetch.fetch(url=url,
    follow_redirects=False,
    deadline=100
    )

This just returns nothing though. Is there any way of getting this html?!

Was it helpful?

Solution

Sorry for the delayed response. I found this that worked:

import urllib, urllib2, Cookie
from google.appengine.api import urlfetch

class URLOpener:
  def __init__(self):
      self.cookie = Cookie.SimpleCookie()

  def open(self, url, data = None):
      if data is None:
          method = urlfetch.GET
      else:
          method = urlfetch.POST

      while url is not None:
          response = urlfetch.fetch(url=url,
                          payload=data,
                          method=method,
                          headers=self._getHeaders(self.cookie),
                          allow_truncated=False,
                          follow_redirects=False,
                          deadline=10
                          )
          data = None # Next request will be a get, so no need to send the data again. 
          method = urlfetch.GET
          self.cookie.load(response.headers.get('set-cookie', '')) # Load the cookies from the response
          url = response.headers.get('location')

      return response

  def _getHeaders(self, cookie):
      headers = {
                 'Host' : 'www.google.com',
                 'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
                 'Cookie' : self._makeCookieHeader(cookie)
                  }
      return headers

  def _makeCookieHeader(self, cookie):
      cookieHeader = ""
      for value in cookie.values():
          cookieHeader += "%s=%s; " % (value.key, value.value)
      return cookieHeader

I guess the key is the while loop - following the redirects based on the return header...

OTHER TIPS

I think this is a problem in the site, not in your code. The site seems designed so it does a redirect to itself when it doesn't detect some header that is customarily sent by a browser. E.g. when I try accessing it with curl I get an empty body with a 302 redirect to itself, but in the browser I get a page. You'd have to ask the site owner what they are checking for...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top