I know of two solutions (one of which I made and works better if the *.mat
file is very large or very deep) that abstracts away your direct interactions with the h5py
library.
- the
hdf5storage
package, which is well maintained and meant to help load v7.3 saved matfiles into Python
- my own matfile loader, which I wrote to overcome certain problems even the latest version (
0.2.0
) of hdf5storage
has loading large (~500Mb) and/or deep arrays (I'm actually not sure which of the two causes the issue)
Assuming you've downloaded both packages into a place where you can load them into Python, you can see that they produce similar outputs for your example 'test.mat'
:
In [1]: pyInMine = LoadMatFile('test.mat')
In [2]: pyInHdf5 = hdf5.loadmat('test.mat')
In [3]: pyInMine()
Out[3]: dict_keys(['struArray'])
In [4]: pyInMine['struArray'].keys()
Out[4]: dict_keys(['data', 'id', 'name'])
In [5]: pyInHdf5.keys()
Out[5]: dict_keys(['struArray'])
In [6]: pyInHdf5['struArray'].dtype
Out[6]: dtype([('name', 'O'), ('id', '<f8', (1, 1)), ('data', 'O')])
In [7]: pyInHdf5['struArray']['data']
Out[7 ]:
array([[array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]]),
array([[3., 4., 5., 6., 7., 8., 9.]]), array([[0.]])]],
dtype=object)
In [8]: pyInMine['struArray']['data']
Out[8]:
array([[array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]]),
array([[3., 4., 5., 6., 7., 8., 9.]]), array([[0.]])]],
dtype=object)
The big difference is that my library converts structure arrays in Matlab into Python dictionaries whose keys are the structure's fields, whereas hdf5storage
converts them into numpy
object arrays with various dtypes storing the fields.
I also note that the indexing behavior of the array is different from how you would expect it from the Matlab approach. Specifically, in Matlab, in order to get the name
field of the second structure, you would index the structure:
[Matlab] >> struArray(2).name`
[Matlab] >> 'two'
In my package, you have to first grab the field and then index:
In [9]: pyInMine['struArray'].shape
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-64-a2f85945642b> in <module>
----> 1 pyInMine['struArray'].shape
AttributeError: 'dict' object has no attribute 'shape'
In [10]: pyInMine['struArray']['name'].shape
Out[10]: (1, 3)
In [11]: pyInMine['struArray']['name'][0,1]
Out[11]: 'two'
The hdf5storage
package is a little bit nicer and lets you either index the structure and then grab the field, or vice versa, because of how structured numpy
object arrays work:
In [12]: pyInHdf5['struArray'].shape
Out[12]: (1, 3)
In [13]: pyInHdf5['struArray'][0,1]['name']
Out[13]: array([['two']], dtype='<U3')
In [14]: pyInHdf5['struArray']['name'].shape
Out[14]: (1, 3)
In [15]: pyInHdf5['struArray']['name'][0,1]
Out[15]: array([['two']], dtype='<U3')
Again, the two packages treat the final output a little differently, but in general are both quite good at reading in v7.3 matfiles. Final thought that in the case of ~500MB+ files, I've found that the hdf5storage
package hangs while loading, while my package does not (though it still takes ~1.5 minutes to complete the load).