How to supply a mean centered variable in a regression model

https://stackoverflow.com/questions/21683727

09-10-2022
|

题

I am trying to fit the following model:

enter image description here

using lm in R.

I cannot get my head around the following behaviour...

library(nlme)
library(plyr)
#create toy data set
df0<-Orthodont
df0<-ddply(df0, .(Subject), mutate, lag1=c(NA,distance[1:(length(distance)-1)]))
df0<-subset(df0, !is.na(lag1))
head(df0)
#   distance age Subject  Sex lag1
# 2     21.5  10     M16 Male 22.0
# 3     23.5  12     M16 Male 21.5
# 4     25.0  14     M16 Male 23.5
# 6     23.5  10     M05 Male 20.0
# 7     22.5  12     M05 Male 23.5
# 8     26.0  14     M05 Male 22.5

lm(distance ~ 1, data=df0)$coef
# (Intercept) 
#     24.6358 
lm(distance ~ lag1, data=df0)$coef
# (Intercept)        lag1 
#   6.2798336   0.7866844 
lm(distance ~ I(lag1-mean(distance)), data=df0)$coef
#              (Intercept) I(lag1 - mean(distance)) 
#               25.6604346                0.7866844

The intercept parameter in the first model is the overall mean of distance. Why does this not re-appear in the final model when I mean centre the lag variable?

解决方案

Try centering by mean(lag1)? Here is an example where it works as expected, but you do have to center on the same independent variable.

> set.seed(1)
> df <- data.frame(x=1:10, y=1:10+runif(10))
> lm(y ~ x, df)$coef
(Intercept)           x 
  0.5111385   1.0073410 
> lm(y ~ 1, df)$coef
(Intercept) 
   6.051514 
> lm(y ~ I(x - mean(x)), df)$coef
   (Intercept) I(x - mean(x)) 
      6.051514       1.007341

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow