Pergunta

I'm using BeautifulSoup to Parse some html, with Spyder as my editor (both brilliant tools by the way!). The code runs fine in Spyder, but when I try to execute the .py file from terminal, I get an error:

file =  open('index.html','r')
soup = BeautifulSoup(file)
html = soup.prettify()
file1 =  open('index.html', 'wb')
file1.write(html)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 5632: ordinal not in range(128)

I'm running OPENSUSE on a linux server, with Spyder installed using zypper. Does anyone have any suggestions what the problem might be? Many thanks.

Foi útil?

Solução

That is because because before outputting the result (i.e writing it to the file) you must encode it first:

file1.write(html.encode('utf-8'))

See every file has an attribute file.encoding. To quote the docs:

file.encoding

The encoding that this file uses. When Unicode strings are written to a file, they will be converted to byte strings using this encoding. In addition, when the file is connected to a terminal, the attribute gives the encoding that the terminal is likely to use (that information might be incorrect if the user has misconfigured the terminal). The attribute is read-only and may not be present on all file-like objects. It may also be None, in which case the file uses the system default encoding for converting Unicode strings.

See the last sentence? soup.prettify returns a Unicode object and given this error, I'm pretty sure you're using Python 2.7 because its sys.getdefaultencoding() is ascii.

Hope this helps!

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top