Question

I have a matrix of predictions. Each row is a prediction for an individual and each column is the prediction from a specific model. I'd like transform this so the first column is the prediction from the 1st model, and the 2nd column is the average of the predictions of the 1st and 2nd models, etc.

So, the transformed matrix would house the running cumulative average of the observations in the original matrix.

I have a sense cumsum can be used with an apply function to achieve this, but am not sure how to arrive at an elegant result (for use with large matrices).

Thanks!

Was it helpful?

Solution

Try this:

# Initialize a testing matrix
(m <- matrix(1:12, 3, 4))

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

# Calculate cumulative average by column for each row
t(apply(m, 1, cumsum) / seq(ncol(m)))

     [,1] [,2] [,3] [,4]
[1,]    1  2.5    4  5.5
[2,]    2  3.5    5  6.5
[3,]    3  4.5    6  7.5

This essentially takes the row-wise cumulative summation, then divides by a recycled array indicating the column index.

Edit: In case you're doing something similar with data frames, this approach using data.table and reshape2 packages could be useful:

library(data.table)
dt <- data.table(m)
# Add row number to melt by
dt[, row := seq(nrow(dt))]

library(reshape2)
dt.molten <- data.table(melt(dt, "row"))
# Row-level format
dt.molten[, cumsum(value) / as.numeric(variable), "row"]

    row  V1                                                                                          
 1:   1 1.0                                                                                          
 2:   1 2.5                                                                                          
 3:   1 4.0                                                                                          
 4:   1 5.5                                                                                          
 5:   2 2.0                                                                                          
 6:   2 3.5                                                                                          
 7:   2 5.0                                                                                          
 8:   2 6.5                                                                                          
 9:   3 3.0                                                                                          
10:   3 4.5                                                                                          
11:   3 6.0
12:   3 7.5

OTHER TIPS

Using the suggested cumsum and apply

mat <- matrix(1:24,ncol=6)
mat
#     [,1] [,2] [,3] [,4] [,5] [,6]
#[1,]    1    5    9   13   17   21
#[2,]    2    6   10   14   18   22
#[3,]    3    7   11   15   19   23
#[4,]    4    8   12   16   20   24

t(apply(mat,1,cumsum)/(seq_len(ncol(mat))))
#     [,1] [,2] [,3] [,4] [,5] [,6]
#[1,]    1    3    5    7    9   11
#[2,]    2    4    6    8   10   12
#[3,]    3    5    7    9   11   13
#[4,]    4    6    8   10   12   14
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top