Question

It would be great if someone can check whether my approach is correct or not. Question in short will be, if the error calculation is the correct way. lets assume i have the following data.

data = c(23.7,25.47,25.16,23.08,24.86,27.89,25.9,25.08,25.08,24.16,20.89)

Furthermore i want to check if my data follows a normal distribution.

Edit: I know that there are tests etc. but i will concentrate on constructing the qqplot with confidence lines. I know that there is a method in the car package, but i want to understand the building of these lines.

So i calculate the percentiles for my sample data as well as for my theoretical distribution (with estimated mu = 24.6609and sigma = 1.6828. So i end up with these two vectors containing the percentiles.

percentileReal =  c(23.08,23.7,24.16,24.86,25.08,25.08,25.16,25.47,25.90)
percentileTheo =  c(22.50,23.24,23.78,24.23,24.66,25.09,25.54,26.08,26.82)

Now i want to calculate the confidence intervall for alpha=0.05 for the theoretical percentiles. If i rembember myself correct, the formula is given by

error = z*sigma/sqrt(n),
value = +- error

with n=length(data) and z=quantil of the normal distribution for the given p.

So in order to get the confidence intervall for the 2nd percentile i'll do the following:

error = (qnorm(20+alpha/2,mu,sigma)-qnorm(20-alpha/2,mu,sigma))*sigma/sqrt(n) 

Insert the values:

error = (qnorm(0.225,24.6609,1.6828)-qnorm(0.175,24.6609,1.6828)) * 1.6828/sqrt(11)
error = 0.152985
confidenceInterval(for 2nd percentil) = [23.24+0.152985,23.24-0.152985]
confidenceInterval(for 2nd percentil) = [23.0870,23.3929]

Finally i have

percentileTheoLower = c(...,23.0870,.....)
percentileTheoUpper = c(...,23.3929,.....)

same for the rest....

So what do you think, can i go with it?

Was it helpful?

Solution

If your goal is to test if the data follows a normal distribution, use the shapiro.wilk test:

shapiro.test(data)
# Shapiro-Wilk normality test
# data:  data
# W = 0.9409, p-value = 0.5306

1-p is the probability that the distribution is non-normal. So, since p>0.05 we cannot assert that the distribution is non-normal. A crude interpretation is that "there is a 53% chance that the distribution is normal."

You can also use qqplot(...). The more nearly linear this plot is, the more likely it is that your data is normally distributed.

 qqnorm(data)

Finally, there is the nortest package in R which has, among other things, the Pearson Chi-Sq test for normality:

 library(nortest)
 pearson.test(data)
 #  Pearson chi-square normality test
 #  data:  data
 #  P = 3.7273, p-value = 0.2925

This (more conservative) test suggest that there is only a 29% chance that the distribution is normal. All these tests are fully explained in the documentation.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top