Вопрос

I'm trying to get the rendered markup for http://www.epicurious.com/recipes/food/reviews/Breaded-Chicken-Cutlets-aka-Grandma-Jodys-Chicken-51114400; in theory the very same markup given by the 'View Page Source' menu option in Firefox.

I'm using a Python 2.7 script and the httplib library (http://docs.python.org/2/library/httplib.html). I've created an HTTPConnection object and when I try to get the markup via the HTTPResponse object's functions, I inevitably get a getaddrinfo - 11004 error. This script has been executed in Windows 7 and Ubuntu environments.

None of the other solutions for this error that I've read fit the bill: I am not behind any firewall, and I have no problem pinging www.google.com. I wonder if that website just doesn't conform to some standard I'm unaware of, as I haven't been able to successfully ping my target website.

I'm open to alternate approaches, let me know if there is a better way.

Это было полезно?

Решение

You might want to check out the reqests library. It makes simple things like this much easier:

import requests

r = requests.get('http://www.epicurious.com/recipes/food/reviews/Breaded-Chicken-Cutlets-aka-Grandma-Jodys-Chicken-51114400')

print r.text

Here are the docs: http://docs.python-requests.org/en/latest/

Ran the above and verified it works.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top