There are lots of advantages of using HDF5. As @EnricoGiampieri says, it's generally used for storing large ensembles of data, rather than just single arrays. It is also useful for storing all the associated metadata at the same time. From the HDF5 website
The HDF5 technology suite includes:
- A versatile data model that can represent very complex data objects and a wide variety of metadata.
- A completely portable file format with no limit on the number or size of data objects in the collection.
- A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
- A rich set of integrated performance features that allow for access time and storage space optimizations.
- Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.
Its a hierarchical data format which is self-describing - which means that the datasets in the file are easily discoverable. It scales to very large file sizes and massively parallel I/O.
As regards compression, this is a property of an individual dataset and needs to be specified when you create that dataset. There are several different options for what compression algorithm to use - GZIP, SZIP and LZF are all supported. There is more information on the h5py wiki.
To apply compression to your file, try this:
import h5py
def store(eigenvalues,eigenvectors,name='01_'):
datafile = h5py.File(name+'data.h5', 'w')
eigenvalues_dset = datafile.create_dataset('eigenvalues', eigenvalues.shape, eigenvalues.dtype, compression='gzip', compression_opts=4)
eigenvectors_dset = datafile.create_dataset('eigenvectors', eigenvalues.shape, eigenvectors.dtype, compression='gzip', compression_opts=4)
datafile['eigenvalues'][:] = eigenvalues
datafile['eigenvectors'][:] = (eigenvectors)
datafile.close()
print "Successfully saved eigenvalues and eigenvectors"
Here I've assumed that eigenvalues
and eigenvectors
are both numpy arrays. You should convert them if they are not (just use numpy.array(eigenvalues)
). Also note that to assign the datasets, I've used [:]
- this is because datafile['eigenvalues']
is an HDF5 object, while datafile['eigenvalues'][:]
is the actual data in that object. The HDF5 object holds not just the data, but also attributes and metadata.