How to aggregate data from 5 minutes to 30 minutes while keeping date intervals and other values

StackOverflow https://stackoverflow.com/questions/20479945

  •  30-08-2022
  •  | 
  •  

Question

I have the following data:

value <- c(1.869, 1.855, 1.855, 1.855, 1.855, 1.855, 1.855, 1.848, 1.848, 1.848, 1.848, 1.848, 1.848, 1.849)
date <- c("2013-08-28 08:00:00 UTC", "2013-08-28 08:05:00 UTC", "2013-08-28 08:10:00 UTC", "2013-08-28 08:15:00 UTC", "2013-08-28 08:20:00 UTC", "2013-08-28 08:25:00 UTC", "2013-08-28 08:30:00 UTC", "2013-08-28 08:35:00 UTC", "2013-08-28 08:40:00 UTC", "2013-08-28 08:45:00 UTC", "2013-08-28 08:50:00 UTC", "2013-08-28 08:55:00 UTC", "2013-08-28 09:00:00 UTC", "2013-08-28 09:05:00 UTC")
indicator <- c(1,0,0,1,0,0,0,0,0,0,0,0,0,1)

data <- data.frame(date=date,value=value, indicator=indicator)

I want to do 2 things. First, I want aggregate/sum it to the 30 minute level, but ending with :00 and :30. For example, the first value in this data would not be included in the calculations, but 8:05 to 8:30 would be aggregated to 8:30, 8:35 to 9:00 to 9:00, and so on. I would also like to aggregate the indicator value. So, if there's a 1 present, I'd like there to be a 1 (I guess sum would work as well since it's non-zero).

I've tried rollapply (which works but I have to manually make sure that the data starts at 8:05) from the zoo package but would like to keep the date and aggregate the indicator as well:

aggdata <- rollapply(data=data$value,width=6,FUN=sum,by=6)

Data that does not include a full 30 minute interval is useless to me, so I'd rather not not include that data. My desired output is:

date                       value  indicator
"2013-08-28 08:00:00 UTC"  1.869  1
"2013-08-28 08:30:00 UTC"  11.13  1
"2013-08-28 09:00:00 UTC"  11.088 0 
"2013-08-28 09:05:00 UTC"  1.849  1

or better yet:

date                       value  indicator
"2013-08-28 08:00:00 UTC"  NA     NA
"2013-08-28 08:30:00 UTC"  11.13  1
"2013-08-28 09:00:00 UTC"  11.088 0 
"2013-08-28 09:05:00 UTC"  NA     NA

or even better:

date                       value  indicator
"2013-08-28 08:30:00 UTC"  11.13  1
"2013-08-28 09:00:00 UTC"  11.088 0 
Was it helpful?

Solution 3

> z <- read.zoo(data, FUN = identity)
> zr <- rollapplyr(z[-1, ], 6, sum, by = 6)
> zr
                         value indicator
2013-08-28 08:30:00 UTC 11.130         1
2013-08-28 09:00:00 UTC 11.088         0

Although it may be better to just leave it in zoo to convert it back to a data frame use:fortify.zoo :

library(ggplot2)
fortify(zr)

OTHER TIPS

This, also, seems correct:

data$date <- as.POSIXct(as.character(data$date))

interval <- seq(min(data$date), max(data$date), "30 mins")

intervals <- c(data$date[1], interval + 5*60)

res <- na.omit(aggregate(list(value = data$value, indicator = data$indicator), 
                                list(date = findInterval(data$date, intervals)), 
                                      function(x) if(length(x) == 6) sum(x) else NA))

res$date <- interval[res$date]

res
#                 date  value indicator
#2 2013-08-28 08:30:00 11.130         1
#3 2013-08-28 09:00:00 11.088         0

That should do the job

## convert from string to date (POSIX)
dt <- strptime(data$date,format="%Y-%m-%d %H:%M:%S")
## create bins to collect the right periods
##  1) subtract the modulo to 30min (-> 30 min bins)
##  2) add 30 if this modulo is not 0 (-> they and at :00 or :30)
bins <- strftime(as.POSIXct(dt+60*(-(dt$min %% 30)
                                   + ifelse(dt$min %% 30,30,0)),
                            origin="1970-01-01"),'%Y-%m-%d %H:%M')
## use this bins
data.frame(value=tapply(data$value,bins,sum),
           indicator=tapply(data$indicator,bins,
             function(x) ifelse(sum(x),1,0)))

To aggregate 8:05 to 8:30 and report that as 8:30 (i.e., report times at the end of the aggregation intervals), and ignore any 30-minute intervals that do no not have 6 observations, use the following:

data$date <- as.POSIXct(data$date)
data$date.30min <- as.POSIXct(ceiling(as.numeric(data$date) / (30 * 60)) *
  (30 * 60), origin='1970-01-01')
sumif6 <- function(x) {
  if(length(x) == 6) sum(x) else NA
}
res30 <- na.omit(aggregate(cbind(value, indicator) ~ date.30min, data, sumif6))
res30
#           date.30min  value indicator
#2 2013-08-28 08:30:00 11.130         1
#3 2013-08-28 09:00:00 11.088         0

If you need to aggregate 8:00 to 8:25 and report that as 8:00 (i.e., report times at the beginning of the aggregation intervals), simply use floor() instead of ceiling:

data$date.30min <- as.POSIXct(floor(as.numeric(data$date) / (30 * 60)) *
  (30 * 60), origin='1970-01-01')

If you need to aggregate by 15 minutes instead of 30, simply replace the 30s with 15s, and create a new sumif3 function:

data$date.15min <- as.POSIXct(floor(as.numeric(data$date) / (15 * 60)) *
  (15 * 60), origin='1970-01-01')
sumif3 <- function(x) {
  if(length(x) == 3) sum(x) else NA
}
res15 <- na.omit(aggregate(cbind(value, indicator) ~ date.15min, data, sumif3))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top