Rudmin (2010) states that exact variance of pooled data set is the mean of the variances plus the variance of the means. flodel has already provided an answer and function that gives similar values to Rudmin's statement. Using Rudmin's data set and flodel's function based on Wikipedia:
df <- data.frame(mean = c(30.66667, 31.14286, 40.33333), variance = c(8.555555, 13.26531, 1.555555), n = c(6,7,3))
grand.sd <- function(S, M, N) {sqrt(weighted.mean(S^2 + M^2, N) -
weighted.mean(M, N)^2)}
grand.sd(sqrt(df$variance), df$mean, df$n)^2
#[1] 22.83983 = Dp variance in Rudmin (2010).
However this solution gives slightly different values compared to the function 5.38 from Headrick (2010) (unless there is a mistake somewhere):
dat <- data.frame(variable = c(rep("x", 2), rep("y", 3)), replicate = c(1,2,1,2,3),
mean = c(3.4, 2.5, 6.5, 5.7, 5.1), sd = c(1.2, 0.7, 2.4, 4.0, 3.5),
n = c(3,3,5,4,6))
x <- subset(dat, variable == "x")
((x$n[1]^2)*(x$sd[1]^2)+
(x$n[2]^2)*(x$sd[2]^2)-
(x$n[2])*(x$sd[1]^2) -
(x$n[2])*(x$sd[2]^2) -
(x$n[1])*(x$sd[1]^2) -
(x$n[1])*(x$sd[2]^2) +
(x$n[1])*(x$n[2])*(x$sd[1]^2) +
(x$n[1])*(x$n[2])*(x$sd[2]^2) +
(x$n[1])*(x$n[2])*(x$mean[1] - x$mean[2])^2)/
((x$n[1] + x$n[2] - 1)*(x$n[1] + x$n[2]))
#[1] 1.015
grand.sd(x$sd, x$mean, x$n)^2
#[1] 1.1675
To answer my own question, the desired data.frame
would be acquired followingly:
library(plyr)
ddply(dat, c("variable"), function(dat) c(mean=with(dat,weighted.mean(mean, n)), sd = with(dat, grand.sd(sd, mean, n))))
variable mean sd
1 x 2.950000 1.080509
2 y 5.726667 3.382793