Question

I have a matrix like

A= [ 1 2 4
     2 3 1
     3 1 2 ]

and I would like to calculate its cumulative sum by row and by column, that is, I want the result to be

B = [ 1  3  7 
      3  8  13 
      6  12 19 ]

Any ideas of how to make this in R in a fast way? (Possibly using the function cumsum) (I have huge matrices)

Thanks!

Was it helpful?

Solution

A one-liner:

t(apply(apply(A, 2, cumsum)), 1, cumsum))

The underlying observation is that you can first compute the cumulative sums over the columns and then the cumulative sum of this matrix over the rows.

Note: When doing the rows, you have to transpose the resulting matrix.

Your example:

> apply(A, 2, cumsum)
     [,1] [,2] [,3]
[1,]    1    2    4
[2,]    3    5    5
[3,]    6    6    7

> t(apply(apply(A, 2, cumsum), 1, cumsum))
     [,1] [,2] [,3]
[1,]    1    3    7
[2,]    3    8   13
[3,]    6   12   19

About performance: I have now idea how good this approach scales to big matrices. Complexity-wise, this should be close to optimal. Usually, apply is not that bad in performance as well.


Edit

Now I was getting curious - what approach is the better one? A short benchmark:

> A <- matrix(runif(1000*1000, 1, 500), 1000)
> 
> system.time(
+   B <- t(apply(apply(A, 2, cumsum), 1, cumsum))
+ )
       User      System     elapsed 
      0.082       0.011       0.093 
> 
> system.time(
+   C <- lower.tri(diag(nrow(A)), diag = TRUE) %*% A %*% upper.tri(diag(ncol(A)), diag = TRUE)
+ )
       User      System     elapsed 
      1.519       0.016       1.530 

Thus: Apply outperforms matrix multiplication by a factor of 15. (Just for comparision: MATLAB needed 0.10719 seconds.) The results do not really surprise, as the apply-version can be done in O(n^2), while the matrix multiplication will need approx. O(n^2.7) computations. Thus, all optimizations that matrix multiplication offers should be lost if n is big enough.

OTHER TIPS

Here is a more efficient implementation using the matrixStats package and a larger example matrix:

library(matrixStats)
A <- matrix(runif(10000*10000, 1, 500), 10000)

# Thilo's answer
system.time(B <- t(apply(apply(A, 2, cumsum), 1, cumsum)))
user  system elapsed 
3.684   0.504   4.201

# using matrixStats
system.time(C <- colCumsums(rowCumsums(A)))
user  system elapsed 
0.164   0.068   0.233 

all.equal(B, C)
[1] TRUE

My solution: Function cumsum_row() (see below) takes matrix M and returns matrix of cumulative sums of M's rows. Function cumsum_col() does same thing for columns.

cumsum_row <- function(M) {
  M2 <- c()
  for (i in 1:nrow(M))
    M2 <- rbind(M2, cumsum(M[i,]))
  return (M2)
}

cumsum_col <- function(M) {
  return (t(cumsum_row(t(M))))
}

Example:

  > M <- matrix(rep(1, 9), nrow=3)
  > M
         [,1] [,2] [,3]
    [1,]    1    1    1
    [2,]    1    1    1
    [3,]    1    1    1

  > cumsum_row(M)
         [,1] [,2] [,3]
    [1,]    1    2    3
    [2,]    1    2    3
    [3,]    1    2    3
  
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top