Question

I'm having problem using decode in python, I'm trying to fetch an IMDB website (example address: http://www.imdb.com/title/tt2216240/):

req = urllib.request.Request(address)
response = urllib.request.urlopen(req)
page = response.read().decode('utf-8', 'ignore')
with open('film.html', 'w') as f:
    print(page, file=f)

I get an error:

UnicodeEncodeError: 'charmap' codec can't encode character '\xe6' in position 4132: character maps to <undefined>
Was it helpful?

Solution

Try to explicitly specify utf-8 file encoding:

with open('film.html', 'w', encoding='utf-8') as f:
    print(page, file=f)

OTHER TIPS

Did already use requests library ?

Anyway, it made simpler:

#samplerequest.py
import requests

address = "http://www.imdb.com/title/tt2216240/"
req = requests.get(address)

print req.text
print req.encoding
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top