List with hex escaped values to readable string in Python

https://stackoverflow.com/questions/14542310

05-03-2022
|

题

I have a list like this:

['<option value="284">\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 Historia </option>', '<option value="393">\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 H\xc3\xa4lsa & sk\xc3\xb6nhet </option>']

How do I convert this list into a list with elements that are actually readable?

I believe it is in ISO 8859-1.

解决方案

Decode the string value using the .decode() method; you are looking at UTF-8 data actually:

>>> print lst[0].decode('utf8')
<option value="284">     Historia </option>
>>> print lst[1].decode('utf8')
<option value="393">     Hälsa & skönhet </option>

The first bytes represent Unicode code point U+00a0, a non-breaking space (  as HTML entity):

>>> lst[0].decode('utf8')
u'<option value="284">\xa0\xa0\xa0\xa0 Historia </option>'
>>> lst[1].decode('utf8')
u'<option value="393">\xa0\xa0\xa0\xa0 H\xe4lsa & sk\xf6nhet </option>'

其他提示

Looks like UTF-8:

>>> s=['<option value="284">\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 Historia </option>', '<option value="393">\xc2\xa0\xc2\xa0\xc2\xa0\xc2\xa0 H\xc3\xa4lsa & sk\xc3\xb6nhet </option>']
>>> for v in s:
...     print v.decode('utf8')
...     
<option value="284">     Historia </option>
<option value="393">     Hälsa & skönhet </option>

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow