How should I convert a string containing unicode characters to unicode?

https://stackoverflow.com/questions/22615243

20-06-2023
|

Question

I thought that I dominated all the Unicode stuff in Python 2, but it seems that there's something I don't understand. I have this user input from HTML that goes to my python script:

a = "m\xe9dico"

I want this to be médico (that means doctor). So to convert that to unicode I'm doing:

a.decode("utf-8")

Or:

unicode(a, "utf-8")

But this is throwing:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

How can achieve this?

Solution

This is not utf-8:

print txt.decode('iso8859-1')
Out[14]: médico

If you want utf-8 string, use:

txt.decode('iso8859-1').encode('utf-8')
Out[15]: 'm\xc3\xa9dico'

OTHER TIPS

You can prefix your string with a u to mark it as a unicode literal:

>>> a = u'm\xe9dico'
>>> print a
médico
>>> type(a)
<type 'unicode'>

or, to convert an existing string:

>>> a = 'm\xe9dico'
>>> type(a)
<type 'str'>
>>> new_a = unicode(a,'iso-8859-1')
>>> print new_a
médico
>>> type(new_a)
<type 'unicode'>
>>> new_a == u'm\xe9dico'
True

Further reading: Python docs - Unicode HOWTO.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow