Question

I appreciate there are similar questions out there, however, I can't seem to find the correct answer for my query. I have a data set where I want to average the data in one column at 5 minute intervals based on my time stamp, with the averaging starting at the beginning of the hour, for example, average at 10:00 for the preceeding 5 minutes which would be 09:56-10:00.

Here is an example of my data set:

data <- structure(list(datetime = c("11/07/2013 19:20", "11/07/2013 19:21", 
"11/07/2013 19:22", "11/07/2013 19:23", "11/07/2013 19:24", "11/07/2013 19:25", 
"11/07/2013 19:26", "11/07/2013 19:27", "11/07/2013 19:28", "11/07/2013 19:29", 
"11/07/2013 19:30", "11/07/2013 19:31", "11/07/2013 19:32", "11/07/2013 19:33", 
"11/07/2013 19:34", "11/07/2013 19:35", "11/07/2013 19:36", "11/07/2013 19:37", 
"11/07/2013 19:38", "11/07/2013 19:39", "11/07/2013 19:40", "11/07/2013 19:41", 
"11/07/2013 19:42", "11/07/2013 19:43", "11/07/2013 19:44", "11/07/2013 19:45"
), met = c(-24.24081371, -24.4280008, -24.35142264, -24.84884114, 
-25.06214408, -25.46749039, -25.44670288, -25.86062294, -26.30899817, 
-26.57565791, -26.6866101, -27.03829228, -27.34621325, -27.91269122, 
-28.60861612, -29.16745075, -28.81285096, -29.89737508, -30.26500716, 
-30.08502411, -31.05084494, -31.21356991, -31.05715444, -32.29645243, 
-32.76946492, -32.69307397)), .Names = c("datetime", "met"),
class = "data.frame", row.names = c(NA, -26L))

I have tried the code below, but I haven't been able to get it working the way I want.

> data$datetime <- as.POSIXct(data$datetime, format="%m/%d/%Y %H:%M")
> groups <- cut(data$datetime, breaks="5 min")
> by(data$met, groups, mean)
groups: 2013-07-11 19:20:00
[1] -24.58624
------------------------------------------------------------------------ 
groups: 2013-07-11 19:25:00
[1] -25.93189
------------------------------------------------------------------------ 
groups: 2013-07-11 19:30:00
[1] -27.51848
------------------------------------------------------------------------ 
groups: 2013-07-11 19:35:00
[1] -29.64554
------------------------------------------------------------------------ 
groups: 2013-07-11 19:40:00
[1] -31.6775
------------------------------------------------------------------------ 
groups: 2013-07-11 19:45:00
[1] -32.69307

These are the correct averages, but the timestamp is the first time stamp of the 5 minute period rather than the last, so 12:01 given by R is actually 12:05 (the period from 12:01 - 12:05). Unfortunately I can't seem to get the output into a format like 12/07/2013 12:05 -19.91691.

Was it helpful?

Solution

The best solution for working with time series is first to use an existing or implement a library of time series handling routines, which would allow such aggregations in general. I would not write these things again and again on per-case basis. In my earlier role I was driving an implementation of such a library, but well, it is proprietary. Therefore a hint:

  • use split to split data$met at the last/first minute of every hour (indices of those rows can be easily obtained from timestamps with basic R knowledge)
  • use sapply across the results with an arbitrary aggregation function, g.e. averaging last 5 values
  • put results into a timeseries with the same indices you used for split

Having said that, you would really be better off writing a timeseries handling library prodiving general aggregation routines in C.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top