Question

Good day! I'm having trouble with decoding text to unicode. I need to convert str which is equal to

    '\u4038' # or something like that       

in ASCII and I need to convert this string to ONE unicode symbol. Can you please explain< how to do that? The

    len(unicode('\u4038')) 

prints 6, so this is not a solution:(

If it's needed, the resulting symbol is cyrillic at the most cases.

Was it helpful?

Solution

If you mean you have a string '\\u4038', you can use unicode-escape encoding:

>>> s = b'\\u4038' # == br'\u4038'

>>> print(s)
\u4038
>>> len(s)
6

>>> print(s.decode('unicode-escape'))
䀸
>>> len(s.decode('unicode-escape'))
1

OTHER TIPS

There's probably a better way, but here is one:

In [27]: s = r'\u4038'

In [28]: len(ast.literal_eval('u"' + s + '"'))
Out[28]: 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top