Domanda

I'm using BeautifulSoup to Parse some html, with Spyder as my editor (both brilliant tools by the way!). The code runs fine in Spyder, but when I try to execute the .py file from terminal, I get an error:

file =  open('index.html','r')
soup = BeautifulSoup(file)
html = soup.prettify()
file1 =  open('index.html', 'wb')
file1.write(html)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 5632: ordinal not in range(128)

I'm running OPENSUSE on a linux server, with Spyder installed using zypper. Does anyone have any suggestions what the problem might be? Many thanks.

È stato utile?

Soluzione

That is because because before outputting the result (i.e writing it to the file) you must encode it first:

file1.write(html.encode('utf-8'))

See every file has an attribute file.encoding. To quote the docs:

file.encoding

The encoding that this file uses. When Unicode strings are written to a file, they will be converted to byte strings using this encoding. In addition, when the file is connected to a terminal, the attribute gives the encoding that the terminal is likely to use (that information might be incorrect if the user has misconfigured the terminal). The attribute is read-only and may not be present on all file-like objects. It may also be None, in which case the file uses the system default encoding for converting Unicode strings.

See the last sentence? soup.prettify returns a Unicode object and given this error, I'm pretty sure you're using Python 2.7 because its sys.getdefaultencoding() is ascii.

Hope this helps!

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top