HDF5 internal data organization and NumPy usage

https://stackoverflow.com/questions/4135293

30-09-2019
|

Question

as hdf5 documentation says, HDF5 stores data using NumPy

"It is built on top of the HDF5 library, the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a fast yet extremely easy-to-use tool for interactively storing and retrieving very large amounts of data"

...

"PyTables uses these NumPy containers as in-memory buffers to push the I/O bandwith towards the platform limits."

So what's the mechanism? How does PyTables are using NumPy?In the end, they generate plain hdf5 accessible from other languages...

Solution

HDF5 is a C language library. HDF5 stores numbers, including floats, in a platform independent manner (scroll down to the table titled "Examples of Native Datatypes and Corresponding C Types," there's more information in the Users Guide).

PyTables simply converts from the HDF5 datatype to a NumPy datatype. And it mixes Python code and native code to reduce I/O overhead.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow