Question

Suppose I have any data stored in bytes. For example:

0110001100010101100101110101101

How can I store it as printable text? The obvious way would be to convert every 0 to the character '0' and every 1 to the character '1'. In fact this is what I'm currently doing. I'd like to know how I could pack them more tightly, without losing information.

I thought of converting bits in groups of eight to ASCII, but some bit combinations are not accepted in that format. Any other ideas?

Was it helpful?

Solution

What about an encoding that only uses "safe" characters like base64?
http://en.wikipedia.org/wiki/Base64

EDIT: That is assuming that you want to safely store the data in text files and such?

In Python 2.x, strings should be fine (Python doesn't use null terminated strings, so don't worry about that).

Else in 3.x check out the bytes and bytearray objects. http://docs.python.org/3.0/library/stdtypes.html#bytes-methods

OTHER TIPS

Not sure what you're talking about.

>>> sample = "".join( chr(c) for c in range(256) )
>>> len(sample)
256
>>> sample
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\
x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABC
DEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83
\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97
\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab
\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf
\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3
\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7
\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb
\xfc\xfd\xfe\xff'

The string sample contains all 256 distinct bytes. There is no such thing as a "bit combinations ... not accepted".

To make it printable, simply use repr(sample) -- non-ASCII characters are escaped. As you see above.

Try the standard array module or the struct module. These support storing bytes in a space efficient way -- but they don't support bits directly.

You can also try http://cobweb.ecn.purdue.edu/~kak/dist/BitVector-1.2.html or http://ilan.schnell-web.net/prog/bitarray/

For Python 2.x, your best bet is to store them in a string. Once you have that string, you can encode it into safe ASCII values using the base64 module that comes with python.

import base64
encoded = base64.b64encode(bytestring)

This will be much more condensed than storing "1" and "0".

For more information on the base64 module, see the python docs.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top