Summarising a POSIX (date/time) referenced vector in another date/time referenced vector
-
21-12-2019 - |
Domanda
Hi all R efficiency gurus (and people with a similar question to me),
This is an efficiency question. I have some very large data set. One data.frame contains data from one instrument with a POSIX date and time with values at a very high frequency. Another data.frame contains data from another instrument with a column of date and time values at much lower sampling frequency.
I wish to assign summary values of the high frequency data frame to the time periods of the low frequency data.frame. This function works, but is very slow when you have millions of data points:
st <- strptime("22/09/2013 12:00:00", "%d/%m/%Y %H:%M:%S")
st.vec <- st + runif(10,0, 60*60*24)
en.vec <- st.vec + 10*60
tm.hfreq <- strptime("22/09/2013 12:00:00", "%d/%m/%Y %H:%M:%S") + runif(400,0, 60*60*24)
vals.hfreq <- runif(400,0, 12000)
intervalstats <- function(strt, fin, vals, tms){
mns <- NULL
mds <- NULL
sds <- NULL
for (i in seq(1,length(fin))){
mns <- append(mns,mean(vals[(tms > strt[i])&(tms < fin[i])]))
sds <- append(sds,sd(vals[(tms > strt[i])&(tms < fin[i])]))
mds <- append(mds,median(vals[(tms > strt[i])&(tms < fin[i])]))
}
res <- cbind(mns, sds, mds)
res
}
intervalstats(st.vec, en.vec, vals.hfreq, tm.hfreq)
Does anyone have a suggestion for a more efficient, faster approach?
Soluzione
You could use an apply
method looking across each row. I did need to convert the dates using as.numeric
so it would work appropriately though. Something like:
lofreq <- data.frame(st.vec,en.vec)
lofreq <- sapply(lofreq, as.numeric)
hifreq <- data.frame(tm.hfreq=as.numeric(tm.hfreq),vals.hfreq)
t(apply(
lofreq,
1,
function(x) {
out <- hifreq$vals.hfreq[hifreq$tm.hfreq > x[1] & hifreq$tm.hfreq < x[2]]
c(mns=mean(out), sds=sd(out), mds=median(out))
}
))
# mns sds mds
# [1,] 8610.664 3179.3055 9392.312
# [2,] 9398.725 844.6824 9039.992
# [3,] 6159.502 3900.0839 6159.502
# [4,] 6428.173 5802.1844 6428.173
# [5,] 5446.384 4770.9478 6783.228
# [6,] 6309.637 2017.6561 6503.751
# [7,] 6312.746 2354.9198 5553.370
# [8,] 4461.549 NA 4461.549
# [9,] 4486.433 6263.8853 4486.433
#[10,] 7279.241 1520.4536 7279.241