Different results in calculating Variance ans Standard Deviation in R

https://stackoverflow.com/questions/20718256

20-09-2022
|

Question

Calculating variance and standard deviation based on the Wikipedia description gives different results compared to the standard functions var() and sd() in R.

Variance: 4 versus 4.571429. Standard deviation: 2 versus 2.13809.

Anyone suggestions or an explanation?

> df <- c(2,4,4,4,5,5,7,9)
> df.length <- length(df)
> df.length
[1] 8

> df.mean <- sum(df) / df.length
> df.mean
[1] 5

> df.difference <- (df - df.mean)**2
> df.difference
[1]  9  1  1  1  0  0  4 16

> sum(df.difference)
[1] 32

> df.variance <- sum(df.difference) / df.length
> df.variance
[1] 4

> df.standard.deviation <- sqrt(df.variance)
> df.standard.deviation
[1] 2

> # mean, var and sd (default R)

> mean(df)
[1] 5

> var(df)
[1] 4.571429

> sd(df)
[1] 2.13809

Solution

It's the difference between dividing by n or (n-1) degrees of freedom.

>df <- c(2,4,4,4,5,5,7,9)
> var(df)
[1] 4.571429


> sum((df-mean(df))^2/length(df))
[1] 4

> sum((df-mean(df))^2/(length(df)-1))
[1] 4.571429

It's n-1 because ... copied straight from Wikipedia (link)

A common way to think of degrees of freedom is as the number of independent pieces of information available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow