You should look at scale()
which does this for you.
Your function is close to being correct; you should add na.rm = TRUE
to both the sd()
and mean()
function calls.
I would write (if not using scale()
) the function using sweep()
instead of the sapply()
. E.g.
ztran <- function(x, na.rm = TRUE) {
mns <- colMeans(x, na.rm = na.rm)
sds <- apply(x, 2, sd, na.rm = na.rm)
x <- sweep(x, 2, mns, "-")
x <- sweep(x, 2, sds, "/")
x
}
In use we have
> df <- data.frame(matrix(1:9, ncol = 3))
> ztran(df)
X1 X2 X3
1 -1 -1 -1
2 0 0 0
3 1 1 1
> scale(df)
X1 X2 X3
[1,] -1 -1 -1
[2,] 0 0 0
[3,] 1 1 1
attr(,"scaled:center")
X1 X2 X3
2 5 8
attr(,"scaled:scale")
X1 X2 X3
1 1 1
sweep
is a very useful vectorised tool for this sort of operation. Notice also that sapply()
simplifies to a matrix, which may not be what you wanted. sweep()
doesn't do this:
> class(ztran(df))
[1] "data.frame"
> class(sapply(df, function(x){(x-mean(x))/sd(x)}))
[1] "matrix"