Pergunta

I hope you are doing very well. I would like to know how to calculate the cumulative sum of a data set with certain conditions. A simplified version of my data set would look like:

t   id  
A   22
A   22
R   22
A   41
A   98
A   98
A   98
R   98
A   46
A   46
R   46
A   46
A   46
A   46
R   46
A   46
A   12
R   54
A   66
R   13 
A   13
A   13
A   13
A   13
R   13
A   13

Would like to make a new data set where, for each value of "id", I would have the cumulative number of times that each id appears , but when t=R I need to restart the counting e.g.

t   id  count
A   22  1
A   22  2
R   22  0
A   41  1
A   98  1
A   98  2
A   98  3
R   98  0
A   46  1
A   46  2
R   46  0
A   46  1
A   46  2
A   46  3
R   46  0
A   46  1
A   12  1
R   54  0
A   66  1
R   13  0
A   13  1
A   13  2
A   13  3
A   13  4
R   13  0
A   13  1

Any ideas as to how to do this? Thanks in advance.

Foi útil?

Solução

Using rle:

out <- transform(df, count = sequence(rle(do.call(paste, df))$lengths))
out$count[out$t == "R"] <- 0

If your data.frame has more than these two columns, and you want to check only these two columns, then, just replace df with df[, 1:2] (or) df[, c("t", "id")].

If you find do.call(paste, df) dangerous (as @flodel comments), then you can replace that with:

as.character(interaction(df))

I personally don't find anything dangerous or clumsy with this setup (as long as you have the right separator, meaning you know your data well). However, if you do find it as such, the second solution may help you.


Update:

For those who don't like using do.call(paste, df) or as.character(interaction(df)) (please see the comment exchanges between me, @flodel and @HongOoi), here's another base solution:

idx <- which(df$t == "R")
ww <- NULL
if (length(idx) > 0) {
    ww <- c(min(idx), diff(idx), nrow(df)-max(idx))
    df <- transform(df, count = ave(id, rep(seq_along(ww), ww), 
                   FUN=function(y) sequence(rle(y)$lengths)))
    df$count[idx] <- 0
} else {
    df$count <- seq_len(nrow(df))
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top