mean and standard deviation by group for multiple variables [duplicate]

Question 1

The function you will likely want to apply to your dataframe is aggregate() with either mean or sd as the function parameter.

Question 2

assuming myDF is your original dataset:

library(data.table)
myDT <- data.table(myDF)

# Which variables to calculate  All columns but the first five? : 
variables <- tail( names(myDT), -5)

myDT[, lapply(.SD, function(x) list(mean(x), sd(x))), .SDcols=variables, by=list(trt, til)]


## OR Separately, if you prefer shorter `lapply` statements
myDT[, lapply(.SD, mean), .SDcols=variables, by=list(trt, til)]
myDT[, lapply(.SD, sd),   .SDcols=variables, by=list(trt, til)]

--

> myDT[, lapply(.SD, mean), .SDcols=variables, by=list(trt, til)]
#    trt til     silt     clay   ibd1_6  ibd9_14  ibd_ave
# 1: CTK  CT 14.66667 13.00000 1.483000 1.596000 1.539667
# 2: CTR  CT 14.00000 13.33333 1.627000 1.601333 1.614333
# 3: ZTK  ZT 16.33333 12.33333 1.480333 1.593000 1.536667
# 4: ZTR  ZT 16.66667 17.00000 1.637000 1.690667 1.663667

> myDT[, lapply(.SD, sd), .SDcols=variables, by=list(trt, til)]
#    trt til      silt      clay     ibd1_6     ibd9_14    ibd_ave
# 1: CTK  CT 0.5773503 1.7320508 0.13908271 0.004358899 0.07112196
# 2: CTR  CT 1.0000000 0.5773503 0.07562407 0.039576929 0.02514624
# 3: ZTK  ZT 0.5773503 0.5773503 0.17015973 0.041797129 0.07800214
# 4: ZTR  ZT 0.5773503 1.0000000 0.09763196 0.030892286 0.04816984

Question 3

aggregate(g[, c("sand", "silt", "clay")],  g$trt, function(x) c(mean=mean(x), sd=sd(x) ) )

Using an anonymous function with aggregate.data.frame allows one to get both values with one call. You only want to pass in the columns to be aggregated.If you had a long list of columns and only wanted to exclude let's say the first 4 from calculations, it could be written as:

aggregate(g[, names(g)[-(1:4)],  g$trt, function(x) c(mean=mean(x), sd=sd(x) ) )