Computing summary statistics over samples from a list of matrices

https://stackoverflow.com/questions/21211248

29-09-2022
|

Question

I have a list of matrices with identical dimensions, for example:

mat.list=rep(list(matrix(rnorm(n=12,mean=1,sd=1), nrow = 3, ncol=4)),3)

What I'd like to do is to sample many times a random column from each matrix in the list, for example in a given sample the column indices to be sampled are:

set.seed(10) #for reproducibility
idx.vec = sample(1:ncol(mat.list[[1]]),length(mat.list))

And this function would return a matrix of the sampled columns:

sample.mat = mapply('[', mat.list, TRUE, idx.vec)

For each such sample matrix I'd like to compute the mean and variance of each row. The result would therefore be a matrix for the means over the samples and a matrix for variances over the samples, such that the dimensions of these matrices will be the number of rows of the matrices in the list by the number samples.

What would be the most efficient (time and space) way to do this?

Solution

I would use replicate , rowMeans for the mean and rowSds from matrixStats:

ll <- length(mat.list)
nn <- ncol(mat.list[[1]])

replicate(3,{
   idx.vec = sample(seq_len(nn),ll)
   sample.mat = mapply('[', mat.list, TRUE, idx.vec)
   list(mm = rowMeans(sample.mat),sd = rowSds(sample.mat))
},simplify=FALSE)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow