Question

I have a timeseries pandas dataframe which dynamically increases the columns every minute as well as adds a new row:

Initial:

timestamp                100     200     300
2020-11-01 12:00:00       4       3       5

Next minute:

timestamp                100     200     300   500
2020-11-01 12:00:00       4       3       5     0
2020-11-01 12:01:00      14       3       5     4

The dataframe has these updated values and so on every minute.

so ideally, I want to design a database solution that supports such a dynamic column structure. The number of columns could grow to over 20-30k+ and since it's one minute timeseries, it will have 500k+ rows per year.

I've read that relational db's have a limit on the number of columns so that might not work here, but also, since I am setting the data for new columns and assigning a default value(0) to previous timestamps, I lose out on the DEFAULT param that's there on MySQL.

Eventually, I will be querying data for 1 day, 1 month to get the data for the columns and their values.

Please suggest a suitable database solution for this type of dynamic row and column data.

Was it helpful?

Solution

Usually a dynamic data problem like this can be solved by storing the dynamic portion of the schema in its own table, transposed as rows.

For example you can have an Intervals table where one column is called Interval and another column called Values. Interval would store 100, 200, 300 etc for each instance of values for that interval.

You could either store the Timestamp as a column in this table too, or my recommendation would be to normalize Timestamp into its own table with a TimestampId that is a foreign key field in your Intervals table.

Implementing your schema this way then allows you to not worry about how many dynamic Intervals are created since this it's a row based generic solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top