You are calling str.encode()
; Python 2 strings are already encoded, so Python tries to do the right thing and first decode to unicode
so it can then encode the value back to a bytestring for you.
This implicit decode is done with the default codec, ASCII
:
>>> '<?xml version="1.0" encoding="UTF-8"?><o><location>san diego, ça</location></o>'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 62: ordinal not in range(128)
Note that I called .encode()
but the exception is UnicodeDecodeError
; Python was decoding here first.
However, because ET.fromstring()
already wants UTF-8 encoded bytes, you do not need to recode the value at all.
If you see problems with parsing the string value, make sure you saved your Python source code using the right codec, UTF8, from your text editor.