Ok, so I have found a way to massively reduce the file size. The point is, despite my prior believes, PyTables does NOT apply compression per default.
You can achieve this by using Filters
.
Here is an example how that works:
import pytables as pt
hdf5_file = pt.openFile(filename = 'myhdf5file.h5',
mode='a',
title='How to compress data')
# for pytables >= 3 the method is called `open_file`,
# other methods are renamed analogously
myfilters = Filters(complevel=9, complib='zlib')
mydescitpion = {'mycolumn': pt.IntCol()} # Simple 1 column table
mytable = hdf5_file.createTable(where='/', name='mytable',
description=mydescription,
title='My Table',
filters=myfilters)
#Now you can happily fill the table...
The important line here is Filters(complevel=9, complib='zlib')
. It specifies the
compression level complevel
and the compression algorithm complib
. Per default the level is set to 0, that means compression is disabled, whereas 9 is the highest compression level. For details on how compression works: HERE IS A LINK TO THE REFERENCE.
Next time, I better stick to RTFM :-) (although I did, but I missed the line "One of the beauties of PyTables is that it supports compression on tables and arrays, although it is not used by default")