Question

Consider that Python will all one to type the name 'Jânis' into the Python CLI, if it is known that the 'â' character is hex "E2" in CP-1252:

>>> 'J\xe2nis'
'Jânis'

How might one type that name into the Python CLI if the Unicode code point is known, but not the CP-1252 point? In fact, the code point in question isU+00E2. Also, the UTF-8 encoded character is %C3 %A2, is there any way to type that into the Python CLI if only that is known?

I am using Python 3.2 on Kubuntu Linux 12.10.

Was it helpful?

Solution

Use unicode escape sequence (\unnnn):

>>> 'J\u00e2nis'
'Jânis'

If you know utf-8, use bytes.decode (utf-8 is default encoding in Python 3.x, so it is optional):

>>> b'J\xc3\xa2nis'.decode('utf-8')
'Jânis'

If you have %C3%A2, use urllib.parse.unquote:

>>> import urllib.parse
>>> urllib.parse.unquote('J%c3%a2nis', encoding='utf-8')
'Jânis'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top