The first step is to convert from a byte string to a Unicode string:
u = s.decode('utf-8')
The second step is to create a new string with every character replaced by its Unicode escape sequence.
new = ''.join('\\u{:04x}'.format(ord(c)) for c in u)
If your intent was only to replace the non-ASCII characters then a slight modification will do:
new = ''.join(c if 32 <= ord(c) < 128 else '\\u{:04x}'.format(ord(c)) for c in u)
Note that the \u0000
notation only works for Unicode codepoints in the base Unicode plane. You need the \U00000000
notation for anything larger. You can also use \x00
notation for anything less than 256. The following handles all cases and is probably a bit easier to read:
def unicode_notation(c):
x = ord(c)
if 32 <= x < 128:
return c
if x < 256:
return '\\x{:02x}'.format(x)
if x < 0x10000:
return '\\u{:04x}'.format(x)
return '\\U{:08x}'.format(x)
new = ''.join(unicode_notation(c) for c in u)