Recommendations for storing time series data

https://datascience.stackexchange.com/questions/6854

16-10-2019
|

Question

As part of my thesis I've done some experiments that have resulted in a reasonable amount of time-series data (motion-capture + eye movements). I have a way of storing and organizing all of this data, but it's made me wonder whether there are best practices out there for this sort of task.

I'll describe what I've got, and maybe that will help provide some recommendations. So, I have an experiment that requires subjects to use their vision and move their body to complete a task. Each task is one trial, and each subject performs multiple trials to complete the experiment. During a trial I record the movement and the eye tracker (~200 channels) at regularly sampled time points (~100Hz). I store these in a CSV file (one file per trial), with one row per time point, and one column per variable (e.g., left-fingertip-x, left-fingertip-y, left-fingertip-z, etc. for the mocap, and left-eye-x, left-eye-y for the eyes).

Associated with each trial is some metadata such as the experimental condition of the trial (e.g., how fast a target in the trial is moving, say). I store these values in the CSV filename itself, using a "key=value" sort of syntax.

While this works well enough for my purposes, it's really ad-hoc! I'd like to get a sense of whether other people have solved problems like this, and, if so, how?

Solution

There are two solutions that are worth looking at:

InfluxDB is an open source database platform specifically designed for time series data. The platform includes many optimized functions related to time and you can collect data on any interval and compute rollups/aggregations when reporting. The company recently launched a query app called Chronograf. I have not used this - but if its no good, you can also check out Grafana (which is very widely used and stable).

The alternative strategy you may want to pursue is an elasticsearch index. Elasticsearch is great at running aggregations and other mathematical functions on data. Many use it to store server log data and then query said data using Kibana.

OTHER TIPS

As part of ELK, you will find a pretty nice time series analysis open source call timelion. Once you catch how it works, it is an excellent framework.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange