Question

Hi all R efficiency gurus (and people with a similar question to me),

This is an efficiency question. I have some very large data set. One data.frame contains data from one instrument with a POSIX date and time with values at a very high frequency. Another data.frame contains data from another instrument with a column of date and time values at much lower sampling frequency.

I wish to assign summary values of the high frequency data frame to the time periods of the low frequency data.frame. This function works, but is very slow when you have millions of data points:

st <- strptime("22/09/2013 12:00:00", "%d/%m/%Y %H:%M:%S")
st.vec <- st + runif(10,0, 60*60*24)
en.vec <- st.vec + 10*60
tm.hfreq <- strptime("22/09/2013 12:00:00", "%d/%m/%Y %H:%M:%S") + runif(400,0,     60*60*24)
vals.hfreq <-  runif(400,0, 12000)

intervalstats <- function(strt, fin, vals, tms){
  mns <- NULL
  mds <- NULL
  sds <- NULL
  for (i in seq(1,length(fin))){
    mns <- append(mns,mean(vals[(tms > strt[i])&(tms < fin[i])]))
    sds <- append(sds,sd(vals[(tms > strt[i])&(tms < fin[i])]))
    mds <- append(mds,median(vals[(tms > strt[i])&(tms < fin[i])]))

}
  res <- cbind(mns, sds, mds)  
  res 
}

intervalstats(st.vec, en.vec, vals.hfreq, tm.hfreq)

Does anyone have a suggestion for a more efficient, faster approach?

Was it helpful?

Solution

You could use an apply method looking across each row. I did need to convert the dates using as.numeric so it would work appropriately though. Something like:

lofreq <- data.frame(st.vec,en.vec)
lofreq <- sapply(lofreq, as.numeric)
hifreq <- data.frame(tm.hfreq=as.numeric(tm.hfreq),vals.hfreq)

t(apply(
  lofreq,
  1,
  function(x) {
    out <- hifreq$vals.hfreq[hifreq$tm.hfreq > x[1] & hifreq$tm.hfreq < x[2]]
    c(mns=mean(out), sds=sd(out), mds=median(out))
  }
))

#           mns       sds      mds
# [1,] 8610.664 3179.3055 9392.312
# [2,] 9398.725  844.6824 9039.992
# [3,] 6159.502 3900.0839 6159.502
# [4,] 6428.173 5802.1844 6428.173
# [5,] 5446.384 4770.9478 6783.228
# [6,] 6309.637 2017.6561 6503.751
# [7,] 6312.746 2354.9198 5553.370
# [8,] 4461.549        NA 4461.549
# [9,] 4486.433 6263.8853 4486.433
#[10,] 7279.241 1520.4536 7279.241
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top