Question

I have multiple datasets of 1 second time resolution real data. This data will often have gaps in the time-series where the instrument dropped data, or when the instrument was turned off, resulting in a patchy (albeit still very useful) dataset. The resulting data might look like the following

Timestamp [timestamp] : datastream1 [double precision] : datastream2 [double precision] : etc
2011-01-01 00:00:01     153.256     1255.325 
2011-01-01 00:00:02     152.954     1254.288 
2011-01-01 00:00:03     151.738     1248.951 
2011-01-01 00:00:04     150.015     1249.185 

2011-01-01 00:10:08     179.132     1328.115
2011-01-01 00:10:09     178.051     1323.125
2011-01-01 00:10:10     180.870     1336.983

2011-01-04 09:19:02     152.198     1462.814
2011-01-04 09:19:03     158.014     1458.122
2011-01-04 09:19:04     156.070     1464.174

Please note: this data are generally continuous but will have random gaps which must be dealt with.

I need to write code to take the average and stdev of a given time interval, "timeInt", that is able to deal with these gaps. For example, if I wanted a 10 min average of data, my required output would be:

Timestamp_10min   :   avg_data1   :   med_data1   :   count_data1

where avg_data1 would be the average of all the data points within a given 10 minute period, and count_data1 would be the number of points used in the calculation of that average (i.e. 600 if there was no missing data, 300 if every second point is missing, etc etc).

This code needs to work with any desired input interval (i.e. x minutes, y days, z weeks, months, years, etc).

Currently I am only able output minute averages using the following code.

CREATE OR REPLACE VIEW "DATATABLE_MIN" AS
  SELECT MIN("DATATABLE"."Timestamp") AS "Timestamp_min",
        avg("DATATABLE"."datastream1") AS "datastream1_avg_min",
        stddev("DATATABLE"."datastream1") AS "datastream1_stdev_min",
        count("DATATABLE"."datastream1") AS "datastream1_avg_min"
      FROM "DATATABLE"
  GROUP BY to_char("DATATABLE"."Timestamp",'YYYY-MM-DD HH24:MI'::text);

Thanks in advance for the help!

Was it helpful?

Solution

To group by 10 minutes, you can do this by using the "epoch":

  SELECT MIN(dt."Timestamp") AS "Timestamp_min",
         avg(dt."datastream1") AS "datastream1_avg_min",
         stddev(dt."datastream1") AS "datastream1_stdev_min",
         count(dt."datastream1") AS "datastream1_avg_min"
  FROM "DATATABLE" dt
  GROUP BY trunc(extract(epoch from dt."TimeStamp") / (60*10));

This is the number of seconds since a fixed time in the past. If you divide it by 600, you get the number of 10 minute intervals -- what you need for the aggregation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top