this is a simple question, and I am sure it is easily solvable with either tapply, apply, or by, etc. However, I am still relatively new to this, and I would like to ask for advice.

The problem:

I have a data frame with say 5 columns. Columns 4 and 5 are factors, say. For each factor in column 5, I want to execute a function over columns 1:3 for each group in my column 5. This is, in principle, easily doable. However, I want to have the output as a nice table, and I want to learn how to do this in an elegant way, which is why I would like to ask you here.

Example:

 df <- data.frame(x1=1:6, x2=12:17, x3=3:8, y=1:2, f=1:3)

Now, the command

 by(df[,1:3], df$y, sum)

would give me the sum based on each factor level in y, which is almost what I want. Two additional steps are needed: one is to do this for each factor level in f. This is almost trivial. I could easily wrap lapply around the above command and I would get what I want, except this: I want to generate a table with the results, and maybe even use it to generate a heatmap.

Hence: is there an easy and more elegant way to do this and to generate a matrix with corresponding output? This seems like an everyday-task for data scientists, which is why I suspect that there is an existing built-in solution...

Thanks for any help or any hint, no matter how small!

有帮助吗?

解决方案

You can use the reshape2 and plyr packages to accomplish this.

library(plyr)
df2 <- ddply(df, .(y, f), sum)

and then to turn it into a f by y matrix:

library(reshape2)
acast(df2, f ~ y, value.var = "V1")
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top