Scraping Metacritic with urllib to follow redirect

https://stackoverflow.com/questions/20644245

python
web-scraping
urllib
screen-scraping

19-09-2022
|

Вопрос

I'm working on a Python script to scrape information from Metacritic. It works fine for most movies but it has issues with movies that Metacritic redirects.

For example on the list of movies, Metacritic provides the url "/movie/red-riding-in-the-year-of-our-lord-1983" but when you click that URL it brings you to "/movie/red-riding-trilogy". I need urllib to fetch the HTML of the final URL it ends up at.

Решение 2

I ended up using the requests module. (http://docs.python-requests.org/en/latest/) Here is the code for the request and the line to save the final url.

response = requests.get(url)
newUrl = response.url

Другие советы

Try using,

import urllib.request
urllib.request.FancyURLopener().open_http("your url")

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow