Question

I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.

Was it helpful?

Solution

Converting a sequence of bytes to a Unicode string is done by calling the decode() method on that str (in Python 2.x) or bytes (Python 3.x) object.

If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist) or b''.join(bytelist).

You need to specify the encoding that was used to encode the original Unicode string.

However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist) will give you a str object.

Demo for Python 2:

In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']

In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'

In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест

In [5]: ''.join(bytelist) == 'тест'
Out[5]: True

OTHER TIPS

you can also convert the byte list into string list using the decode()

stringlist=[x.decode('utf-8') for x in bytelist]

Here's what worked the best for me:

import codecs

print(type(byteData)) # <class 'bytes'>
strData = codecs.decode(byteData, 'UTF-8')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top