Question

i want to convert the chinese character to the unicode format, like '\uXXXX' but when i use str.encode('utf-16be'), it'll show that:

b'\xOO\xOO'

so, i write some code to perform my request as below:

data="index=索引?"
print(data.encode('UTF-16LE'))

def convert(s):
    returnCode=[]
    temp=''
    for n in s.encode('utf-16be'):
        if temp=='':
            if str.replace(hex(n),'0x','')=='0':
                temp='00'
                continue
            temp+=str.replace(hex(n),'0x','')
        else:
            returnCode.append(temp+str.replace(hex(n),'0x',''))
            temp=''

    return returnCode

print(convert(data))

can someone give me suggestions to do this conversion in python 3.x?

Was it helpful?

Solution

I'm not sure if I understand you well.

Unicode is like a type. In python 3, all strings are unicode, so when you write data = "index=索引?" then data is already unicode. If you want to get an alternative representation just for displaying, you could use:

def display_unicode(data):
    return "".join(["\\u%s" % hex(ord(l))[2:].zfill(4) for l in data])

>>> data = "index=索引?"
>>> print(display_unicode(data))
\u0069\u006e\u0064\u0065\u0078\u003d\u7d22\u5f15\u003f

Note that the string has now real backslashes and numeric representations, not unicode characters.

But there may be other alternatives

>>> data.encode('ascii', 'backslashreplace')
b'index=\\u7d22\\u5f15?'
>>> data.encode('unicode_escape')
b'index=\\u7d22\\u5f15?'

OTHER TIPS

Try to decode first, like: s.decode('utf-8').encode('utf-16be')?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top