Here is a solution in 1 line using the ddply
function from the plyr
package and the lubridate
package to parse the dates.
Code:
library(plyr)
library(lubridate)
new_df <- ddply(.data=df, .variables=c('id'), summarize,
days=round(ymd_hms(t[match('R',e)])-ymd_hms(t[match('A',e)]),1))
new_df
Output:
id days
1 086 10.9 days
2 115 NA days
3 522 NA days
4 524 2.3 days
5 638 3.2 days
6 836 1.8 days
Note that there are 2 warnings because the id
s 115 and 522 do not have a value for the e
variable.
If you want the date difference to be a decimal value, you can use the as.double
function, like so:
Basically, I am using the match
function to find the first occurrence of A
and R
, parsing the date variable with the ymd_hms
function from the lubridate
package, and then finding the difference of the two dates. I round it to 1 decimal place, and then convert it into a double
for display.
EDIT
After reading the OPs comments, here is a rather ugly way to get the desired result. Forgive me, it is early in the morning, and it may not be elegant or efficient, but it seems to output the desired result.
Code:
grouper <- function(var, group) {
num <- 1
res <- c(1:length(var))
for(i in 1:length(var)) {
res[i] <- num
if(var[i]==group) {
num <- num+1
}
}
return(res)
}
df2 <- df
df2$group <- ddply(.data=df, .variables='id', summarize, group=grouper(e,'R'))$group
df3 <- ddply(.data=df2, .variables=c('id','group'), summarize,
days=round(ymd_hms(t[match('R',e)])-ymd_hms(t[match('A',e)]),1))
df3[complete.cases(df3),-2]
Output:
id days
1 086 10.9 days
6 524 2.3 days
7 524 2.5 days
9 638 3.2 days
10 638 9.6 days
12 836 1.8 days
13 836 4.8 days
14 836 11.3 days
16 836 1.7 days
The idea is to add another column that groups the rows by the occurrence of an 'R' event, so that I can subset the data set by both ID and 'R' event. It is kind of hacky, and I am sure there are more elegant ways to do it.
Now, I'm off to get some coffee.