select records according to the difference between records R

https://stackoverflow.com/questions/21515491

06-10-2022
|

Question

I hope someone could suggest me something for this "problem", because I really don't know how to proceed... Well, my data are like this

data<-data.frame(site=c(rep("A",3),rep("B",3),rep("C",3)),time=c(100,180,245,5,55,130,70,120,160))

where time is in minute. I want to select only the records, for each site, for which the difference is more than 60, so the output should be Like this:

out<-data[c(1:4,6,7,9),]

What I have tried so far. Well,to get the difference I use this:

difference<-stack(tapply(data$time,data$site,diff))

but then, no idea how to pick up those records which satisfied my condition... If there is already a similar question, although I've searched for a while, I apologize for this. To make things clear, as probably the definition of difference was not so unambiguous, I need to select all the records (for each site) which are separated at least by 60 minutes, so not only those that are strictly subsequent in time. Specifically,

> out
site time
1    A  100#included because difference between 2 and 1 is>60
2    A  180#included because difference between 3 and 2 is>60
3    A  245#included because separated by 6o minutes before record#2
4    B    5#included because difference between 6 and 4 is>60
6    B  130#included because separated by 6o minutes before record#4
7    C   70#included because difference between 9 and 7 is>60
9    C  160#included because separated by 60 minutes before record#7

May be to solve the "problem", it could be useful to consider the results of the difference, something like this:

> difference
values ind
1     80   A#include record 1 and 2
2     65   A#include record 2 and 3
3     50   B#include only record 4
4     75   B#include record 6 because there are(50+75)>60 m from r#4
5     50   C#include only record 7
6     40   C#include record 9 because there are (50+40)>60 m from r#7

Thanks for the help.

Solution

data[ave(data$time, data$site, FUN = function(x){c(61, diff(x)) > 60}) == 1, ]

#   site time
# 1    A  100
# 2    A  180
# 3    A  245
# 4    B    5
# 6    B  130
# 7    C   70

Edit following updated question:

keep <- as.logical(ave(data$time, data$site, FUN = function(x){
  c(TRUE, cumsum(diff(x)) > 60)
}))

data[keep, ]

#   site time
# 1    A  100
# 2    A  180
# 3    A  245
# 4    B    5
# 6    B  130
# 7    C   70
# 9    C  160

OTHER TIPS

#Calculate the differences
data$diff <- unlist(by(data$time, data$site,function(x)c(NA,diff(x))))
#subset data
data[is.na(data$diff) | data$diff > 60,]

Using plyr:

ddply(dat,.(site),function(x)x[c(TRUE , diff(x$time) >60),])

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow