One-way repeated measures ANOVA with unbalanced data

https://stackoverflow.com/questions/17686054

r
anova

03-06-2022
|

Question

I'm new to R, and I've read these forums (for help with R) for awhile now, but this is my first time posting. After googling each error here, I still can't figure out and fix my mistakes.

I am trying to run a one-way repeated measures ANOVA with unequal sample sizes. Here is a toy version of my data and the code that I'm using. (If it matters, my real data have 12 bins with up to 14 to 20 values in each bin.)

## the data: average probability for a subject, given reaction time bin
bin1=c(0.37,0.00,0.00,0.16,0.00,0.00,0.08,0.06)
bin2=c(0.33,0.21,0.000,1.00,0.00,0.00,0.00,0.00,0.09,0.10,0.04)
bin3=c(0.07,0.41,0.07,0.00,0.10,0.00,0.30,0.25,0.08,0.15,0.32,0.18)

## creating the data frame

# dependent variable column
probability=c(bin1,bin2,bin3)

# condition column
bin=c(rep("bin1",8),rep("bin2",11),rep("bin3",12))

# subject column (in the order that will match them up with their respective
# values in the dependent variable column)
subject=c("S2","S3","S5","S7","S8","S9","S11","S12","S1","S2","S3","S4","S7",
  "S9","S10","S11","S12","S13","S14","S1","S2","S3","S5","S7","S8","S9","S10",
  "S11","S12","S13","S14")

# putting together the data frame
dataFrame=data.frame(cbind(probability,bin,subject))

## one-way repeated measures anova
test=aov(probability~bin+Error(subject/bin),data=dataFrame)

These are the errors I get:

Error in qr.qty(qr.e, resp) : 
  invalid to change the storage mode of a factor
In addition: Warning messages:
1: In model.response(mf, "numeric") :
  using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : - not meaningful for factors
3: In aov(probability ~ bin + Error(subject/bin), data = dataFrame) :
  Error() model is singular

Sorry for the complexity (assuming it is complex; it is to me). Thank you for your time.

Solution

For an unbalanced repeated-measures design, it might be easiest to use lme (from the nlme package):

## this should be the same as the data you constructed above, just 
## a slightly more compact way to do it.
datList <- list(
   bin1=c(0.37,0.00,0.00,0.16,0.00,0.00,0.08,0.06),
   bin2=c(0.33,0.21,0.000,1.00,0.00,0.00,0.00,0.00,0.09,0.10,0.04),
   bin3=c(0.07,0.41,0.07,0.00,0.10,0.00,0.30,0.25,0.08,0.15,0.32,0.18))
subject=c("S2","S3","S5","S7","S8","S9","S11","S12",
          "S1","S2","S3","S4","S7","S9","S10","S11","S12","S13","S14",
          "S1","S2","S3","S5","S7","S8","S9","S10","S11","S12","S13","S14")
d <- data.frame(probability=do.call(c,datList),
                bin=paste0("bin",rep(1:3,sapply(datList,length))),
                subject)

library(nlme)
m1 <- lme(probability~bin,random=~1|subject/bin,data=d)
summary(m1)

The only real problem is that some aspects of the interpretation etc. are pretty far from the classical sum-of-squares-decomposition approach (e.g. it's fairly tricky to do significance tests of variance components). Pinheiro and Bates (Springer, 2000) is highly recommended reading if you're going to head in this direction.

It might be a good idea to simulate/make up some balanced data and do the analysis with both aov() and lme(), look at the output, and make sure you can see where the correspondences are/know what's going on.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow