Why does changing contrast type change row labels in R lm summary?

https://stackoverflow.com/questions/10808853

11-06-2021
|

Frage

With the default contrasts in R (contr.treatment), the summary of a linear model object gives row names according to the level names. When I change the contrasts to contr.sum, the summary of the linear model object gives row names according to made up numbers.

For the example code below, the row names for treatment contrasts are xa xb xc xd xe, for sum contrasts they are x1 x2 x3 x4 x5.

Is there a way to make these behave the same way besides manually renaming the rows?

EXAMPLE:

y <- rnorm(10, 0, 1)
x <- factor(rep(letters[1:5], each = 2))

options(contrasts = c("contr.treatment", "contr.poly"))
summary(lm(y ~ x))

options(contrasts = c("contr.sum", "contr.poly"))
summary(lm(y ~ x))

Lösung

I'm still not at all sure this is a good idea, I think the possibility of getting confused about what the contrasts mean is too high. Still, what I would do is to make a new contrasts function that computes sum contrasts but sets the names equal the default names from the treatment contrasts.

set.seed(5)
n <- 5
y <- c(10 + rnorm(n, 0, 1), 20 + rnorm(n, 0, 1), 30 + rnorm(n, 0, 1))
wFactor <- as.factor(c(rep("A", n), rep("B", n), rep("C", n)))

contr.sumX <- function(...) {
  conT <- contr.treatment(...)
  conS <- contr.sum(...)
  colnames(conS) <- colnames(conT)
  conS
}

For reference, here's the usual output:

> m1 <- lm(y ~ wFactor, contrasts = list(wFactor=contr.sum(n = levels(wFactor))))
> coef(summary(m1))
              Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 19.8218432  0.2481727  79.8711599 9.889455e-18
wFactor1    -9.6079241  0.3509692 -27.3754029 3.480430e-12
wFactor2    -0.1934654  0.3509692  -0.5512319 5.915907e-01

And here's the output with the contr.sumX function.

> m2 <- lm(y ~ wFactor, contrasts = list(wFactor=contr.sumX(n = levels(wFactor))))
> coef(summary(m2))
              Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 19.8218432  0.2481727  79.8711599 9.889455e-18
wFactorB    -9.6079241  0.3509692 -27.3754029 3.480430e-12
wFactorC    -0.1934654  0.3509692  -0.5512319 5.915907e-01

Alternately, you can set the contrasts for a particular factor ahead of time:

contrasts(wFactor) <- "contr.sumX"
m3 <- lm(y ~ wFactor)
> coef(summary(m3))
              Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 19.8218432  0.2481727  79.8711599 9.889455e-18
wFactorB    -9.6079241  0.3509692 -27.3754029 3.480430e-12
wFactorC    -0.1934654  0.3509692  -0.5512319 5.915907e-01

Andere Tipps

I like your solution @Aaron, and have implemented it, but I think it contains a dangerous error. The sum contrast gives you the the differences between the first n-1 factors and the grand mean, not the last n-1 factors, which is what your naming algorithm returns. See Crawley's R Book 2nd Edition page 442-443.

Thus, I believe the correct function should instead be:

contr.sum.keepnames <- function(...) {
    conS <- contr.sum(...)
    colnames(conS) = rownames(conS)[-length(rownames(conS))]
    conS
}

BTW, I tried adding this as a comment, but had difficulty adding a codeblock within the comment.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow