Question

I'm new to plyr and want to take the weighted mean of values within a class to reshape a dataframe for multiple variables. Using the following code, I know how to do this for one variable, such as x2:

set.seed(123)
frame <- data.frame(class=sample(LETTERS[1:5], replace = TRUE),
                    x=rnorm(20), x2 = rnorm(20), weights=rnorm(20))
ddply(frame, .(class),function(x) data.frame(weighted.mean(x$x2, x$weights)))       

However, I would like the code to create a new data frame for x and x2 (and any amount of variables in the frame). Does anybody know how to do this? Thanks

Was it helpful?

Solution

You might find what you want in the ?summarise function. I can replicate your code with summarise as follows:

library(plyr)
set.seed(123)
frame <- data.frame(class=sample(LETTERS[1:5], replace = TRUE), x=rnorm(20), 
                    x2 = rnorm(20), weights=rnorm(20))
ddply(frame, .(class), summarise, 
      x2 = weighted.mean(x2, weights)) 

To do this for x as well, just add that line to be passed into the summarise function:

ddply(frame, .(class), summarise, 
      x = weighted.mean(x, weights),
      x2 = weighted.mean(x2, weights)) 

Edit: If you want to do an operation over many columns, use colwise or numcolwise instead of summarise, or do summarise on a melted data frame with the reshape2 package, then cast back to original form. Here's an example.


That would give:

wmean.vars <- c("x", "x2")

ddply(frame, .(class), function(x)
      colwise(weighted.mean, w = x$weights)(x[wmean.vars]))

Finally, if you don't like having to specify wmean.vars, you can also do:

ddply(frame, .(class), function(x)
      numcolwise(weighted.mean, w = x$weights)(x[!colnames(x) %in% "weights"]))

which will compute a weighted-average for every numerical field, excluding the weights themselves.

OTHER TIPS

A data.table answer for fun, which also doesn't require specifying all the variables individually.

library(data.table)
frame <- as.data.table(frame)
keynames <- setdiff(names(frame),c("class","weights"))
frame[, lapply(.SD,weighted.mean,w=weights), by=class, .SDcols=keynames]

Result:

   class          x         x2
1:     B  0.1390808 -1.7605032
2:     D  1.3585759 -0.1493795
3:     C -0.6502627  0.2530720
4:     E  2.6657227 -3.7607866
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top