문제

i store (non equidistant) time series as tables in hdf5 files using the H5TB API. The format is like this:

time   channel1   channel2
0.0    x          x
1.0    x          x
2.0    x          x

There are also insertions of "detail data" like this:

time   channel1   channel2
0.0    x          x
1.0    x          x
1.2    x          x
1.4    x          x
1.6    x          x
1.8    x          x
2.0    x          x

Now I want to store the data in another data format and therefore I like to "query" the hdf5 file like this:

select ch1 where time > 1.6 && time < 3.0

I thought of several ways to do this query:

  1. There is a built in feature called B-Tree Index. Is it possible to use this for indexing the data?
  2. I need to do a binary search on the time channel and then read the channel values
  3. I create an index myself (and update it whenever there is a detail insertion). What would be the best algorithm to use here?

The main motivation for an index would be to have fast query responses.

What would you suggest here?

도움이 되었습니까?

해결책

I found another (obvious) solution finally by myself. The easiest way is to open the hdf5 file only read the time channel and create an in memory map before reading the data channels. This process could even be optimized by reading the time channel with a sparse hyperslab.

When the indexes at a particular time are known then the data could be read.

다른 팁

Assuming you're not asking about how to parse the data out of a hdf5 file, merely about how to use the data once parsed....

Given class channel_data { ... };, a std::map<double, channel_data> should suit your needs, specifically std::map<>::lower_bound() and std::map<>::upper_bound().

A popular approach to solving this problem appears to be using bitmap indexing. There are also papers written on doing this, but they do not appear to have published any code.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top