Following your edit I believe what you want to do is to see how many times over sims
trials the skewness calculated from a size n
sample taken from a normal distribution would be rejected as too skewed by a significance test at the alph
level.
You have several coding problems
- You want to do the z score test for every trial, but the test is outside the loop.
- The z score is calculated using the vector
t
, but you want to calculate it using the scalart[i]
. There is a
return
statement inside the loop which will cause the function to terminate in the first iteration of the loop, returning the z score. For reason no 2., the z score is a vector, but its second to last values are all zero because you've only run one iteration, so a typical output from the function is as follows0.003623371 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
Fixing these immediate problems results in the following code
library(e1071)
testsk=function(n,alph,sims) {
t=numeric(sims)
for (i in 1:sims) {
x=rnorm(n)
t[i]=skewness(x)
zscore=t[i]/(6/n)
if(zscore>=qnorm(1-alph/2)){
print("REJECT")
}
}
}
However this stil suffers from some problems:
- From a programming point of view
- printing out "REJECT" gives immediate feedback but isn't very scalable. If you had
sims=1000
you would be better off returning the number of rejects,nr
. And if you still wanted to print out "REJECT"nr
times you could do so :) - Also the code could be simpler, and written in more of an R style, vectorizing rather than using loops. This would also have the advantage of being much faster. Because R is an interpreted language, vectorizing makes a huge difference because the number crunching can happen under the hood without R having to step through your
for
loop over and over.
- printing out "REJECT" gives immediate feedback but isn't very scalable. If you had
- Perhaps more seriously, there are some statistical problems:
- 6/n is an estimate of the variance of the skewness (wikipedia), but you want the standard deviation so you need to take the square root of 6/n.
- The code rejects if the z score is greater than the
1-alph/2
th quantile but it should also reject if the z score is less than thealph/2
th quantile. As it stands your rejection region isalph/2
notalph
. - There may be other issues too but these are the main ones in my opinion. (I assume you're aware that 6/n is only a good estimate of the variance for large samples.)
A program that is along the right lines is as follows
library(e1071)
testsk=function(n,alph,sims) {
# Generate random numbers in a matrix, each trial is a row
X=matrix(rnorm(sims*n), ncol=n)
# Get skewnesses, 1 means apply to rows
skews=apply(X,1,skewness)
# Calculate z score vector and rejection vector
zscore=skews/sqrt(6/n)
reject=!(qnorm(alph/2) < zscore & zscore < qnorm(1-alph/2))
# Return the number of rejections
sum(reject)
}
You ought to be able to modify it to suit your purposes, but I can clarify if necessary.