cumsum starting over after NA

https://stackoverflow.com/questions/21628105

r
cumsum

08-10-2022
|

Question

I have a dataframe with multiple columns. For one column I would like to calculate the cumulative sums but I have some trouble with missing values.

#sample data
test <- c(-1.2, 4.6, -8.3, 5, 8, 1, -2, NA, NA, NA, -3, 5.1, 1.9)
test <- as.data.frame(test)

#This gives NA after NAs occurred
sum_test <- lapply(test, FUN=cumsum)

sum_test
$test
 [1] -1.2  3.4 -4.9  0.1  8.1  9.1  7.1   NA   NA   NA   NA   NA   NA

#This continues with adding to pre-NA value after last NA
sum_test <- lapply(test, function(x) ave(x, is.na(x), FUN=cumsum))

sum_test
$test
 [1] -1.2  3.4 -4.9  0.1  8.1  9.1  7.1   NA   NA   NA  4.1  9.2 11.1

However, what I would like to achieve is that after the NAs cumsum starts over:

-1.2  3.4 -4.9  0.1  8.1  9.1  7.1   NA   NA   NA -3   2.1   4

Can this be done?

Solution 2

This should do the trick:

test <- c(-1.2, 4.6, -8.3, 5, 8, 1, -2, NA, NA, NA, -3, 5.1, 1.9)
tmp <- rle(is.na(test))
ind <- rep(seq_along(tmp$value), tmp$lengths)
as.vector(unlist(tapply(test, ind, cumsum)))

OTHER TIPS

Here g defines a grouping variable and then we apply cumsum separately over each group:

test <- c(-1.2, 4.6, -8.3, 5, 8, 1, -2, NA, NA, NA, -3, 5.1, 1.9)
g <- cumsum(is.na(head(c(0, test), -1)))
ave(test, g, FUN = cumsum)

which gives:

[1] -1.2  3.4 -4.9  0.1  8.1  9.1  7.1   NA   NA   NA -3.0  2.1  4.0

ADDED: Note that head(c(0, test), -1) just lags test so dplyr's lag function could be used to shorten this slightly:

library(dplyr)
ave(test, cumsum(is.na(lag(test))), FUN = cumsum)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow