Question

I have a string that contains printable and unprintable characters, for instance:

'\xe8\x00\x00\x00\x00\x60\xfc\xe8\x89\x00\x00\x00\x60\x89'

What's the most "pythonesque" way to convert this to a bytes object in Python 3, i.e.:

b'\xe8\x00\x00\x00\x00`\xfc\xe8\x89\x00\x00\x00`\x89'
Was it helpful?

Solution

If all your codepoints are within the range U+0000 to U+00FF, you can encode to Latin-1:

inputstring.encode('latin1')

as the first 255 codepoints of Unicode map one-to-one to bytes in the Latin-1 standard.

This is by far and away the fastest method, but won't work for any characters in the input string outside that range.

Basically, if you got Unicode that contains 'bytes' that should not have been decoded, encode to Latin-1 to get the original bytes again.

Demo:

>>> '\xe8\x00\x00\x00\x00\x60\xfc\xe8\x89\x00\x00\x00\x60\x89'.encode('latin1')
b'\xe8\x00\x00\x00\x00`\xfc\xe8\x89\x00\x00\x00`\x89'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top