Question

I have a bz2 compressed binary (big endian) file containing an array of data. Uncompressing it with external tools and then reading the file in to Numpy works:

import numpy as np
dim = 3
rows = 1000
cols = 2000
mydata = np.fromfile('myfile.bin').reshape(dim,rows,cols)

However, since there are plenty of other files like this I cannot extract each one individually beforehand. Thus, I found the bz2 module in Python which might be able to directly decompress it in Python. However I get an error message:

dfile = bz2.BZ2File('myfile.bz2').read()
mydata = np.fromfile(dfile).reshape(dim,rows,cols)

>>IOError: first argument must be an open file

Obviously, the BZ2File function does not return a file object. Do you know what is the correct way read the compressed file?

Was it helpful?

Solution

BZ2File does return a file-like object (although not an actual file). The problem is that you're calling read() on it:

dfile = bz2.BZ2File('myfile.bz2').read()

This reads the entire file into memory as one big string, which you then pass to fromfile.

Depending on your versions of numpy and python and your platform, reading from a file-like object that isn't an actual file may not work. In that case, you can use the buffer you read in with frombuffer.

So, either this:

dfile = bz2.BZ2File('myfile.bz2')
mydata = np.fromfile(dfile).reshape(dim,rows,cols)

… or this:

dbuf = bz2.BZ2File('myfile.bz2').read()
mydata = np.frombuffer(dbuf).reshape(dim,rows,cols)

(Needless to say, there are a slew of other alternatives that might be better than reading the whole buffer into memory. But if your file isn't too huge, this will work.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top