Question

What does explain the difference between the two aov below:

a = c(0.04875,0.13725,0.28350,0.50975,0.77425,0.94700,0.05325,0.14050,0.29725,0.51525,0.79000,0.95400,0.04625,0.15250,0.29000,0.53300,0.79825,0.95225,0.05025,0.14625,0.28800,0.52625,0.78200,0.95925,0.04700,0.14225,0.30325,0.53500,0.79325,0.95875,0.04775,0.13850,0.28675,0.54250,0.78300,0.95175,0.05150,0.12725,0.30175,0.54725,0.79475,0.96275,0.05375,0.14100,0.30050,0.53275,0.78100,0.96175,0.05450,0.15300,0.29650,0.52850,0.80100,0.95675,0.05425,0.13975,0.30875,0.56025,0.80575,0.96100,0.05100,0.15350,0.31175,0.53300,0.78900,0.96000,0.04650,0.13525,0.29600,0.53625,0.78475,0.96375,0.05375,0.13900,0.29600,0.53725,0.78700,0.95800,0.05075,0.14350,0.29225,0.54525,0.80275,0.95800,0.05050,0.13200,0.29850,0.52700,0.80525,0.96150,0.05150,0.14050,0.29450,0.54375,0.79450,0.96375,0.05375,0.13525,0.30475,0.55250,0.79425,0.96025,0.04950,0.14500,0.29425,0.52250,0.78475,0.95650,0.05225,0.14425,0.29225,0.53150,0.80425,0.95375)
b = c(4,4,4,4,4,4,6,6,6,6,6,6,8,8,8,8,8,8,10,10,10,10,10,10,12,12,12,12,12,12,14,14,14,14,14,14,16,16,16,16,16,16,18,18,18,18,18,18,20,20,20,20,20,20,22,22,22,22,22,22,24,24,24,24,24,24,26,26,26,26,26,26,28,28,28,28,28,28,30,30,30,30,30,30,32,32,32,32,32,32,34,34,34,34,34,34,36,36,36,36,36,36,38,38,38,38,38,38,40,40,40,40,40,40)
c = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)


summary(lm(a~b*as.factor(c)))
summary(lm(a~b*c))

Does it mean that c is considered as non-ordinal when using as.factor?

Was it helpful?

Solution

In both cases, you are exploring a as a function of b and c and their interactions.

When you coerce c to a factor, dummy variables are computed for each distinct value of c (in reality each level of c, but in this case all levels are present, so these are the same). So the interaction explored is between each value of c with b.

Otherwise, the interaction explored is that of a numeric variable.

The difference might be more clear if c were to have more drastic value differences, ie

c = c(1, 17, 2, 5, 131, 1, 4, 5, 2, 11, 17, 7, 1, 1, 17, .... etc)   

Also, on a side note when learning R, avoid using c as a variable name. It is also the name of a heavily used function and it will quickly make code unreadable and lead to possible confusions

OTHER TIPS

You can examine the structure of the models created by looking at their model.matrix() results, since the model.matrix functions is what is used by the lm function to construct the data for analysis from the RHS of the formula:

> dim(model.matrix(~b*as.factor(c)))
[1] 114  12
> dim( model.matrix(~b*c))
[1] 114   4

> colnames(model.matrix(~b*as.factor(c)))
 [1] "(Intercept)"     "b"               "as.factor(c)2"   "as.factor(c)3"  
 [5] "as.factor(c)4"   "as.factor(c)5"   "as.factor(c)6"   "b:as.factor(c)2"
 [9] "b:as.factor(c)3" "b:as.factor(c)4" "b:as.factor(c)5" "b:as.factor(c)6"
> colnames( model.matrix(~b*c))
[1] "(Intercept)" "b"           "c"           "b:c"  

The column names for the 'c'-variable in the second model are not separated into distinct levels as they were in the first model. The 'b:c'-column will be the product of 'b' and 'c':

> describe(b*c)
b * c 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
    114       0      67      77    12.0    16.6    30.5    62.0   111.5   160.0 
    .95 
  190.7 

lowest :   4   6   8  10  12, highest: 200 204 216 228 240 
> describe(model.matrix(~b*c)[, "b:c"])
model.matrix(~b * c)[, "b:c"] 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
    114       0      67      77    12.0    16.6    30.5    62.0   111.5   160.0 
    .95 
  190.7 

lowest :   4   6   8  10  12, highest: 200 204 216 228 240 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top