Question

I am trying to write an HDF5 file from C++. The file basically contains a large timeseries matrix in the following format

TimeStamp    Property1      Property2

I have managed to write the data successfully, I created a dset and used the H5Dwrite function.

Now my question is how do I create a file header, in other words, if I want to write the following array to the file...

['TimeStamp', 'Property1', 'Property2']

...and tag it to the columns for ease of later use ( I am planning to analyze the matrix in Python). How to do that?

I tried to use H5Dwrite to write a string array but failed, I guess it wanted consistent datatypes, so it just wanted floats, which is the datatype for my data. Then I read about this metadata thing, but I am a bit lost as to how to use it? Any help would be much appreciated.

A related side question is can the first row of a matrix be a string and the others rows contain doubles?

Was it helpful?

Solution

Clean solution(s)

If you store your data as a 1D array of a compound datatype with members TimeStamp, Property1, Property2, etc. then the field names will be stored as metadata and it should be easy to read in Python.

I think there is another clean option but I will just mention it since I never used it myself: HDF5's Table Interface. Read the docs to see if you would prefer to use that.

Direct answers to your question

Now the dirty options: you could add string attributes to your existing dataset. There are multiple ways to do that. You could have a single string attribute with all the field names separated by semicolons, or one attribute per column. I don't recommend it since that would be terribly non-standard.

A related side question is can the first row of a matrix be a string and the others rows contain doubles?

No.

Example using a compound datatype

Assuming you have a struct defined like this:

struct Point { double timestamp, property1, property2; };

and a vector of Points:

std::vector<Point> points;

as well as a dataset dset and appropriate memory and file dataspaces, then you can create a compound datatype like this:

H5::CompType type(sizeof(DataPoint));
type.insertMember("TimeStamp", HOFFSET(Point, timestamp), H5::PredType::NATIVE_DOUBLE);
type.insertMember("Property1", HOFFSET(Point, property1), H5::PredType::NATIVE_DOUBLE);
type.insertMember("Property2", HOFFSET(Point, property2), H5::PredType::NATIVE_DOUBLE);

and write data to file like this:

dset.write(&points[0], type, mem_space, file_space);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top