質問

I am running a simple multivariate regression on a panel/time-series dataset, using lm() and the underlying formula $(X'X)^{-1} X'Y$

expecting to get the same coefficient values from the two methods. However, I get completely different estimates.

Here is the R code:

  return = matrix(ret.ff.zoo, ncol = 50)  # y vector
  data = cbind(df$EQ, df$EFF, df$SIZE, df$MOM, df$MSCR, df$SY, df$UMP)   # x vector

  #First method     
  BETA = solve(crossprod(data)) %*% crossprod(data, return)

  #Second method
  OLS <- lm(return ~ data)

I am not sure why the estimates are different between the two methods..

Any help is appreciated! Thank you.

役に立ちましたか?

解決

Your example isn't reproducible, but if you try it with some dummy data, the matrix formula and lm produce the same results when you take out the intercept:

set.seed(1)

x <- matrix(rnorm(1000),ncol=5)
y <- rnorm(200)

solve(t(x) %*% x) %*% t(x) %*% y
              [,1]
[1,] -0.0826496646
[2,] -0.0165735273
[3,] -0.0009412659
[4,]  0.0070475728
[5,] -0.0642452777
> lm(y ~ x + 0)

Call:
lm(formula = y ~ x + 0)

Coefficients:
        x1          x2          x3          x4          x5  
-0.0826497  -0.0165735  -0.0009413   0.0070476  -0.0642453  
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top