Using R for simulation based power analysis of Multi-Factor Within-Subjects Repeated Measures ANOVA

StackOverflow https://stackoverflow.com/questions/21493807

  •  05-10-2022
  •  | 
  •  

Question

I have been trying to run a power analysis for study design that will be analyzed by a 4 factor Repeated Measures ANOVA where all factors are within-subjects. After a lot of searching and some help on Cross Validated, it is clear that I need to

  • Do a simulation where I generate data based on some pilot data I have
  • Run it through the appropriate anova model
  • And then iterate to find the average power.

The code pasted below was found at this link, and it seems to do exactly what I want to do, albeit for only a 2 factor within-subjects ANOVA.

However, I am having trouble understanding all of the details and what specific lines do because I am very new to R. I would be very grateful if anyone could shed some light on these lines of code:

My questions:

  1. Why is the intercept for all conditions set to -1? Is that standard to this model? intercept = rep(-1, nconds)

  2. Do the values in these columns represent the levels of the x1 and x2 factors or are they SDs etc? If they are the levels, shouldn't the levels for x2 be (0, .5, 0, .5)?

    true.effect.x1 = c(0, 0, .5, .5)
    true.effect.x2 = c(0, .5, .5, .5)
    
  3. What mean and SDs do I take from my pilot data (from multiple subjects) for these three lines of code?

    #relatively large subject-specific variance in intercepts
    sub.intercept = rep(rnorm(nsub, mean=0, sd=2), times=1, each=nconds) 
    #relatively small by-subject adjustments to the effects
    sub.effect = rep(rnorm(nsub, mean=0, sd=0.05), times=1, each=nconds) 
    #unexplained error
    error = rnorm(nsub*nconds, mean=0, sd=1)
    

I know this is a very long question but I would really appreciate any help anyone could provide! Thank you so so much!

FULL CODE

library(ez)

nsub = 30
nconds = 4

nsims = 100

#create an empty matrix that will be populated with p-values 
p = matrix(NA, nrow=nsims, ncol=3)

#subject vector
sub = sort(rep(1:nsub, nconds))

#2x2 factorial design
cond = data.frame(x1=c('a','a','b','b'), x2=c('c','d','c','d'))

# fixed effects 
intercept = rep(-1, nconds)
true.effect.x1 = c(0, 0, .5, .5)
true.effect.x2 = c(0, 0.5, .5, .5)
X = rep((intercept + true.effect.x1 + true.effect.x2),nsub)

#simulation loop
for (x in 1:nsims)
{

#random effects

#relatively large subject-specific variance in intercepts
  sub.intercept = rep(rnorm(nsub, mean=0, sd=2), times=1, each=nconds) 

#relatively small by-subject adjustments to the effects
  sub.effect = rep(rnorm(nsub, mean=0, sd=0.05), times=1, each=nconds) 

#unexplained error
  error = rnorm(nsub*nconds, mean=0, sd=1)

#simulated dependent variable
  observed.y = X + (sub.intercept + sub.effect + error)

#place everything in a data frame
  df = cbind(sub, cond, observed.y)
  names(df) = c('sub','x1','x2','y')
  df$sub = as.factor(df$sub)

#extract the p-values for each effect from a repeated measure ANOVA (ezANOVA from 'ez' package)
  p[x,1] = ezANOVA(data=df, dv=.(y), wid=.(sub), within=.(x1, x2))$ANOVA[1,5]
  p[x,2] = ezANOVA(data=df, dv=.(y), wid=.(sub), within=.(x1, x2))$ANOVA[2,5]
  p[x,3] = ezANOVA(data=df, dv=.(y), wid=.(sub), within=.(x1, x2))$ANOVA[3,5]
}

###### p-values < .05 ? ######
sig.x1 = ifelse(p[,1] <= .05, 1, 0)
sig.x2 = ifelse(p[,2] <= .05, 1, 0)
sig.int = ifelse(p[,3] <= .05, 1, 0)

###### Histograms ######
par(mfrow=c(3,1))
hist(p[,1], 20, xaxp=c(0, 1, 20), col=2, main = paste('power X1:', mean(sig.x1 * 100), '%  with ', nsub, 'subjects'))
hist(p[,2], 20, xaxp=c(0, 1, 20), col=2, main = paste('power X2:', mean(sig.x2 * 100), '%  with ', nsub, 'subjects'))
hist(p[,3], 20, xaxp=c(0, 1, 20), col=2, main = paste('power interaction:', mean(sig.int * 100), '%  with ', nsub, 'subjects'))

No correct solution

OTHER TIPS

1) Because -1 = -sum(true.effect.x1) and the author that you were incompletely copying only had one "true.effect". They wanted a sum-type of contrast. This is suggesting to me that you don't have a sufficient background in statistics to really understand this project.

2) No, they represent covariate values against which coefficients will be estimated. You could use contrasts like c(-2,0,1,1) and the beta estimate would have been one-half what it would be for c(-1,0,-.5,-.5). Think of them as categorical indicators. "Dummy variables" is one term that is used.

3) The question is confusing and I think it is you that are confused and not me. The sd specified is twice the effect, and I think you have copied the comment above from another source without understanding.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top