Question

I'm using numpy and Python 2.7 to compute large (100 million+ elements) boolean arrays for a super-massive prime sieve and write them to binary files to read at a much later time. NumPy bools are 8-bit, so the file size that I'm writing is much larger than necessary. Since I'm writing a large number of these files I'd like to keep them as small as humanly possible without having to waste a lot of time/memory converting them to a bitarray and back.

I was originally going to switch to using the bitarray module to keep file size down, but the sieve computation time increased by around 400% with the same algorithms, which is a bit unacceptable. Is there a fast-ish way to write and read back the ndarray in a smaller file, or is this a trade-off that I'm just going to have to deal with?

Était-ce utile?

La solution

numpy.packbits to turn it into a uint8 array for writing, then numpy.unpackbits after reading it back. numpy.packbits pads the axis you're packing along with zeros to get to a multiple of 8, so make sure you keep track of how many zeros you'll need to chop off the end when you unpack the array.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top