Question

Here is the code in python activation mode:

>>> s = u'赵孟頫'
>>> s.encode('gbk')
'\xd5\xd4\xc3\xcf\xee\\'

Why does the GBK string has a trailing backslash?

Was it helpful?

Solution

In [8]: '\xd5\xd4\xc3\xcf\xee\\' == '\xd5\xd4\xc3\xcf\xee\x5c'
Out[8]: True

The trailing backslash is just the byte '\x5c'.

In [9]: hex(ord('\\'))
Out[9]: '0x5c'

In [10]: '\x5c'
Out[10]: '\\'

A string is just a sequence of bytes, and the final byte just happens to be the same as a backslash encoded in ASCII. When Python prints the repr of a string, it converts bytes into printable ASCII characters when possible.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top