Question

Sorry, people, I can't see the forest for the trees. I searched a lot but couldn't find a solution. I want, e.g., the mean for every unit (potentially the rowMeans) of a subset of variables in a matrix (or potentially a dataframe) in R. I would like to select the columns using an indexing vector as in tapply, which I called a1 in the example below.

> set.seed(23958)
> (dat <- matrix(sample(0:3, 10, replace = TRUE), ncol = 5))
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    3    0    2    1
[2,]    2    1    1    2    1

> set.seed(6112)
> (a1 <- sample(1:2, 5, replace = TRUE))
[1] 1 1 2 2 1

The solution in this example should look like this, but of course I would like to do it in a more comprehensive way. I was thinking I should use a function from the apply family, but I could not find out which one.

> cbind(rowMeans(dat[, a1 == 1]), rowMeans(dat[, a1 == 2]))
         [,1] [,2]
[1,] 2.000000  1.0
[2,] 1.333333  1.5
Was it helpful?

Solution

You can still use tapply here:

do.call(rbind,
          tapply(seq_len(ncol(dat)),a1,
           function(i)rowMeans(dat[,i])))

OTHER TIPS

If you transpose your data, you can use by:

t(do.call(rbind,by(t(dat),a1,colMeans)))
          1   2
V1 2.000000 1.0
V2 1.333333 1.5

You could also use the aggregate function:

t(aggregate(t(dat), list(a1), mean))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top