I'd like to follow up with how I used Aaron's suggestions to produce a table that summarizes data. It should be easy to extend to other stats, like min, max, skew, etc..
The data:
df <- data.frame(
'ID' = c("123A","456B","789C","1011","1213")
, 'Test 1' = c(13,8,14,NA,15)
, 'Test 2' = c(13,4,16,7,12)
, 'Test 3' = c(15,9,13,6,13)
)
Several functions that compute stats used to summarize the data:
colMean <- function(df, na.rm = TRUE) {# either remove or annull NAs
if (!na.rm) {# annull NAs
df[is.na(df)] <- 0
}
colMean <- colMeans(df, na.rm=na.rm)
return(colMean)
}
rowMean <- function(df, na.rm = TRUE) {# either remove or annull NAs
if (!na.rm) {# annull NAs
df[is.na(df)] <- 0
}
rowMean <- rowMeans(df, na.rm=na.rm)
return(rowMean)
}
rowSd <- function(df, na.rm = TRUE) {# either remove or annull NAs
if (na.rm) {# remove NAs
n <- rowSums(!is.na(df))
} else {
df[is.na(df)] <- 0
n <- ncol(df)
}
rowMean <- rowMeans(df, na.rm=na.rm)
rowVar <- rowMeans(df*df, na.rm=na.rm) - (rowMeans(df, na.rm=na.rm))^2
rowSd <- sqrt(rowVar * n/(n-1))
return(rowSd)
}
colSd <- function(df, na.rm = TRUE) {# either remove or annull NAs
if (na.rm) {# remove NAs
n <- colSums(!is.na(df))
} else {
df[is.na(df)] <- 0
n <- nrow(df)
}
colMean <- colMeans(df, na.rm=na.rm)
colVar <- colMeans(df*df, na.rm=na.rm) - (colMeans(df, na.rm=na.rm))^2
colSd <- sqrt(colVar * n/(n-1))
return(colSd)
}
The summary as a function of dataframe 'df', the along-column stats 'col', the along-row 'stats' and the padding character 'pad.' The 'pad' character could be set to an empty cell with "" or set to NA or something else. By default, the NAs are removed along columns but set to zero along rows by default.
summ <- function(df
, col = list("colMean" = colMean)
, row = list("rowMean" = rowMean)
, pad = NA_character_)
{
dfN <- df[sapply(df, is.numeric)]
colN <-lapply(col, function(x){formatC(x(dfN, na.rm = TRUE), 'digits' = 1, 'format' = "f")})
rowN <-lapply(row, function(x){formatC(x(dfN, na.rm = FALSE), 'digits' = 1, 'format' = "f")})
pad <- rep(pad,'length' = length(colN))
out <- cbind(rbind(format(dfN),do.call(rbind,colN)), lapply(rowN,function(x){c(x,pad)}))
return(print(out, 'right' = FALSE))
}
Examples of usage:
c <- list("colMean" = colMean, "colSd" = colSd)
r <- list("rowMean" = rowMean, "rowSd" = rowSd)
summ(df)
summ(df,c,r)
summ(df,'col'=c,'row'=r)
summ(df,'col'=c,'row'=r, 'pad'="X")
Test.1 Test.2 Test.3 rowMean rowSd
1 13 13 15 13.7 1.2
2 8 4 9 7.0 2.6
3 14 16 13 14.3 1.5
4 NA 7 6 4.3 3.8
5 15 12 13 13.3 1.5
colMean 12.5 10.4 11.2 X X
colSd 3.1 4.8 3.6 X X
Naturally, feel free to comment. Thanks!