I couldn't identify the exact cause, but it seems a problem related to urllib2
. Simply changing to requests
, it started to work. Here is the code:
import requests
from bs4 import BeautifulSoup
url = "http://www.expatistan.com/cost-of-living/comparison/phoenix/new-york-city"
page = requests.get(url).text
soup_expatistan = BeautifulSoup(page)
expatistan_table = soup_expatistan.find("table", class_="comparison")
expatistan_titles = expatistan_table.find_all("tr", class_="expandable")
for expatistan_title in expatistan_titles:
published_date = expatistan_title.find("th", class_="percent")
print(published_date.span.string)
You can use pip
in order to install requests
:
$ pip install requests
EDIT
The problem is indeed related to urllib2
. It seems that www.expatistan.com
server responds differently according to the User-Agent set in the request. In order to get the same response with urllib2
, you have to do the following:
url = "http://www.expatistan.com/cost-of-living/comparison/phoenix/new-york-city"
request = urllib2.Request(url)
opener = urllib2.build_opener()
request.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130406 Firefox/23.0')
page = opener.open(request).read()