url fetch too many repeated redirects
-
28-10-2019 - |
Question
I am trying to load a url and I get this error:
DownloadError: ApplicationError: 2 Too many repeated redirects
This is the code I am using:
headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; de-at) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1' }
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
cmcHTM = urlfetch.fetch(url=url)
cmcHTML = str(cmcHTM.content)
I check the redirections of this website at: http://www.internetofficer.com/seo-tool/redirect-check/ and I found that this site is redirected to itself! So url fetch seems to be going in circles trying to load this page. Meanwhile, this page loads just fine in my browser.
So I tried using this code:
cmcHTM = urlfetch.fetch(url=url,
follow_redirects=False,
deadline=100
)
This just returns nothing though. Is there any way of getting this html?!
Solution
Sorry for the delayed response. I found this that worked:
import urllib, urllib2, Cookie
from google.appengine.api import urlfetch
class URLOpener:
def __init__(self):
self.cookie = Cookie.SimpleCookie()
def open(self, url, data = None):
if data is None:
method = urlfetch.GET
else:
method = urlfetch.POST
while url is not None:
response = urlfetch.fetch(url=url,
payload=data,
method=method,
headers=self._getHeaders(self.cookie),
allow_truncated=False,
follow_redirects=False,
deadline=10
)
data = None # Next request will be a get, so no need to send the data again.
method = urlfetch.GET
self.cookie.load(response.headers.get('set-cookie', '')) # Load the cookies from the response
url = response.headers.get('location')
return response
def _getHeaders(self, cookie):
headers = {
'Host' : 'www.google.com',
'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
'Cookie' : self._makeCookieHeader(cookie)
}
return headers
def _makeCookieHeader(self, cookie):
cookieHeader = ""
for value in cookie.values():
cookieHeader += "%s=%s; " % (value.key, value.value)
return cookieHeader
I guess the key is the while loop - following the redirects based on the return header...
OTHER TIPS
I think this is a problem in the site, not in your code. The site seems designed so it does a redirect to itself when it doesn't detect some header that is customarily sent by a browser. E.g. when I try accessing it with curl I get an empty body with a 302 redirect to itself, but in the browser I get a page. You'd have to ask the site owner what they are checking for...