Use requests
which provides a friendly wrapper around the libraries in Python; and it handles redirection for you.
Your code with requests is simply:
import requests
r = requests.get('http://supercoach.heraldsun.com.au')
문제
I'm trying to read some content from a URL using python but am getting a 404 every time I try.
Here is my test code, and the offending URL:
url = 'http://supercoach.heraldsun.com.au'
headers = {"User-agent": "Mozilla/5.0"}
req = urllib2.Request(url, None, headers)
try:
handle = urllib2.urlopen(req)
except IOError, e:
print e.code
The site works fine in a browser, and I have previously had no issues with this script, but a recent update to the site has caused it to fail.
I've tried adding a user agent header as similar questions have that as a suggestion.
Any ideas why this isn't working?
Thanks JP
해결책
Use requests
which provides a friendly wrapper around the libraries in Python; and it handles redirection for you.
Your code with requests is simply:
import requests
r = requests.get('http://supercoach.heraldsun.com.au')
다른 팁
Try to set cookies and increase number of allowed redirections:
import urllib2
from cookielib import CookieJar
class RedirectHandler(urllib2.HTTPRedirectHandler):
max_repeats = 100
max_redirections = 1000
def http_error_302(self, req, fp, code, msg, headers):
print code
print headers
return urllib2.HTTPRedirectHandler.http_error_302(
self, req, fp, code, msg, headers)
http_error_300 = http_error_302
http_error_301 = http_error_302
http_error_303 = http_error_302
http_error_307 = http_error_302
cookiejar = CookieJar()
urlopen = urllib2.build_opener(RedirectHandler(),
urllib2.HTTPCookieProcessor(cookiejar)).open
request = urllib2.Request('http://supercoach.heraldsun.com.au',
headers={"User-agent": "Mozilla/5.0"})
response = urlopen(request)
print '*' * 60
print response.info()
print response.read()
response.close()