문제

I'm having problem using decode in python, I'm trying to fetch an IMDB website (example address: http://www.imdb.com/title/tt2216240/):

req = urllib.request.Request(address)
response = urllib.request.urlopen(req)
page = response.read().decode('utf-8', 'ignore')
with open('film.html', 'w') as f:
    print(page, file=f)

I get an error:

UnicodeEncodeError: 'charmap' codec can't encode character '\xe6' in position 4132: character maps to <undefined>
도움이 되었습니까?

해결책

Try to explicitly specify utf-8 file encoding:

with open('film.html', 'w', encoding='utf-8') as f:
    print(page, file=f)

다른 팁

Did already use requests library ?

Anyway, it made simpler:

#samplerequest.py
import requests

address = "http://www.imdb.com/title/tt2216240/"
req = requests.get(address)

print req.text
print req.encoding
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top