How to subsample a data frame based on a datetime column in R

Question 1

It is hard to guess what structure you have. Is it guaranteed that you have one value at exactly the first time value + x times 60 minutes? What happens if the value can not be found? What happens if you have two values at that time. Do you need approximate matching? Say, 09:10 is counted as 09:09?

On idea to get you started is the following:

# I will call your dataframe `d`. 
# Transform datetime to a POSIXct object, R's datatype for timestamps
d$datetime <- as.POSIXct(as.character(d$datetime), format='%d/%m/%Y %H:%M')
# Extract the minutes
d$minute <- as.numeric(format(d$datetime, '%M'))
# And select by identical minute.
subset(d, minute == d$minute[1])

Question 2

> df$datetime <- strptime(df$datetime, format = "%d/%m/%Y %H:%M")                                                                                                                                                                           
> df$dif <- c(0, cumsum(as.numeric(diff(df$datetime))))                                                                                                                                                                                     
>                                                                                                                                                                                                                                           
> df[df$dif %% 60 == 0,]                                                                                                                                                                                                              

               datetime a_count b_count dif
  2011-03-30 05:09:00      66  166.49   0
  2011-03-30 06:09:00      36  169.96  60
  2011-03-30 07:09:00      24  171.94 120
  2011-03-30 08:09:00      45  174.99 180

I have the same questions as Thilo, but heres another solution.

Question 3

You can also use the lubridate packages to change the format of your times which may be a bit more intutitive and easy to remember.

Also, you can add variables based on the hour, and then summarize how you would like with plyr.

in the example below I took the sum and mean of a_count. May need to vary based on your purpose.

library(plyr)
library(lubridate)

df2 <- mutate(df, dt = dmy_hm(as.character(datetime)), hour = hour(dt), minute = minute(dt))
summary <- ddply(df2, .(hour), summarize, a_mean = mean(a_count), a_sum = sum(a_count))