Question

I want to write a numpy array to a binary file but i want to use nonstandard size coding for integer. For example, some int arrays will be written to the file as 3 bit integers, some as 7 bit and some as 13 bit...

I have seen that there is an numpy method tofile() it however works only with the given array dtypes which are int8, int16, int32 etc. (Reference)

How would i write it to a file with variable bit length?

Was it helpful?

Solution

To give a concrete example with bitstring:

>>> from bitstring import Bits
>>> a = [3,1,2,6,4,10]  # some unsigned integers to encode
>>> p = 5               # number of bits of precision to use

Now create 5-bit bitstrings from each integer and join them together:

>>> b = Bits().join(Bits(uint=x, length=p) for x in a)
>>> b
Bits('0b000110001000001001100010001010')

Which can be converted to bytes, but note that it will be padded with zero bits up to a byte boundary if needed. When writing to a file you're always going to have a whole number of bytes as that just the way that file systems work:

>>> b.tobytes()
'\x18\x82b('    

To decode it again there are a number of options, but as everything is the same length the cut method is useful:

>>> [x.uint for x in b.cut(p)]
[3, 2, 1, 6, 4, 10]

See the docs for more information. In terms of efficiency it should be pretty good for pure Python. If you really need more speed then try the bitarray module instead, which is implemented in C and should be able to handle this problem equally well.

OTHER TIPS

I have written a method for this using the bitstring module.

def int_array_to_bitstream(int_array, precision):
    int_list = int_array.astype(int).tolist()
    bits = ''
    for integer in int_array:
        bits += bitstring.BitStream(int=integer, length=precision)
    return bits

It takes every member of the numpy array and converts it to a binary representation of an integer using precision number of bits. This does exactly what I wanted, it is however to slow for usage.

I'm looking for a faster way to achieve the same, either as a whole new aproach or by improving the current method

Update 14.6.

Tried with the method in the other answer.

def int_array_to_bitstream_ver2(int_array, precision):
    bits = bitstring.BitStream().join(bitstring.BitStream(uint=integer, length=precision) for integer in int_array)
    return bits

The speed difference is minimal. For int_array = arange(100000) and precision = 24.

int_array_to_bitstream -> 5.958 sec
int_array_to_bitstream_ver1 -> 5.614 sec
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top