Odd phenomenon: In aov() objects created inside a loop, the data frames of the different aov objects are identical

StackOverflow https://stackoverflow.com/questions/15579609

  •  29-03-2022
  •  | 
  •  

Pregunta

I was running aov() inside a loop. The output of each iteration was saved as an list item. When I used model.frame() to check the data frame of each aov object, all of them were identical - in fact, all of them were basically the data frame of the last iteration. More intriguingly, this only seems to happen for aov models that include an error term. To illustrate this, I use the ChickWeight data in R (see below)

results<-list()
length(results2)<-4
counter<-1
for(i in unique(ChickWeight$Diet)){
results[[counter]] <- aov(weight~factor(Time) + Error(Chick),    
                              ChickWeight[ChickWeight$Diet==i,])
counter<-counter+1
}
head(model.frame(results[[1]]), 2)
#    weight factor(Time) Chick
#461     42            0    41
#462     51            2    41
head(model.frame(results[[2]]), 2)
#    weight factor(Time) Chick
#461     42            0    41
#462     51            2    41
head(model.frame(results[[3]]), 2)
#    weight factor(Time) Chick
#461     42            0    41
#462     51            2    41
head(model.frame(results[[4]]), 2)
#    weight factor(Time) Chick
#461     42            0    41
#462     51            2    41

In contrast, this does not happen when there is no error term:

results2<-list()
length(results2)<-4
counter<-1
for(i in unique(ChickWeight$Diet)){
results2[[counter]] <- aov(weight~factor(Time),    
                               ChickWeight[ChickWeight$Diet==i,])
counter<-counter+1
}
head(model.frame(results2[[1]]), 2)
#  weight factor(Time)
#1     42            0
#2     51            2
head(model.frame(results2[[2]]), 2)
#    weight factor(Time)
#221     40            0
#222     50            2
head(model.frame(results2[[3]]), 2)
#    weight factor(Time)
#341     42            0
#342     53            2
head(model.frame(results2[[4]]), 2)
#    weight factor(Time)
#461     42            0
#462     51            2

I wonder if anyone knows what is going on.

¿Fue útil?

Solución

As far as I understand it, if an error term is part of the model model.frame.aovlist gets the data by evaluating the call attribute of the fit, which is - attr(*, "call")= language aov(formula = weight ~ factor(Time) + Error(Chick), data = ChickWeight[ChickWeight$Diet == i, ]). As you see, the call depends on the value of i. Furthermore, the aovlist object saves the name of the environment to look up the data. In this case it's R_GlobalEnv.

Look at this:

results<-list()
counter<-1
for(i in unique(ChickWeight$Diet)){
  results[[counter]] <- aov(weight~factor(Time) + Error(Chick),    
                            data=ChickWeight[ChickWeight$Diet==i,])
  counter<-counter+1
}

head(model.frame(results[[1]]), 2)
#     weight factor(Time) Chick
# 461     42            0    41
# 462     51            2    41

i <- 1
head(model.frame(results[[1]]), 2)
#   weight factor(Time) Chick
# 1     42            0     1
# 2     51            2     1

You can avoid the problem, if you use the split-apply-combine approach. Here is an example using by:

res <- by(ChickWeight,ChickWeight$Diet,FUN=function(DF) {
  aov(weight~factor(Time)+ Error(Chick) ,    
      data=DF)
})

head(model.frame(res[[1]]),2)
#  weight factor(Time) Chick
#1     42            0     1
#2     51            2     1

Using this approach, the correct environment to look up the data is saved, i.e. the environment of the call of the anonymous function passed to by.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top