Regression with Heteroskedasticity Corrected Standard Errors

https://stackoverflow.com/questions/4385436

r
stata

10-10-2019
|

Pergunta

I would like to find the R implementation that most closely resembles Stata output for fitting a Least Squares Regression function with Heteroskedastic Corrected Standard Errors. Specifically I would like the corrected standard errors to be in the "summary" and not have to do additional calculations for my initial round of hypothesis testing. I am looking for a solution that is as "clean" as what Eviews and Stata provide.

So far, using the "lmtest" package the best I can come up with is:

model <- lm(...)
coeftest(model, vcov = hccm)

This gives me the output that I want, but it does not seem to be using "coeftest" for its stated purpose. I would also have to use the summary with the incorrect standard errors to read off the R^2 and F stat, etc. I feel that there should exist a "one line" solution to this problem given how dynamic R is.

Thanks

Solução

I think you are on the right track with coeftest in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found.

> # generate linear regression relationship
> # with Homoskedastic variances
> x <- sin(1:100)
> y <- 1 + x + rnorm(100)
> ## model fit and HC3 covariance
> fm <- lm(y ~ x)
> vcovHC(fm)
            (Intercept)           x
(Intercept) 0.010809366 0.001209603
x           0.001209603 0.018353076
> coeftest(fm, vcov. = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To get the F test, look at function waldtest():

> waldtest(fm, vcov = vcovHC)
Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You could always cook up a simple function to combine these two for you if you wanted the one-liner...

There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want.

Edit: A one-liner could be as simple as:

mySummary <- function(model, VCOV) {
    print(coeftest(model, vcov. = VCOV))
    print(waldtest(model, vcov = VCOV))
}

Which we can use like this (on the examples from above):

> mySummary(fm, vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Outras dicas

I found an R function that does exactly what you are looking for. It gives you robust standard errors without having to do additional calculations. You run summary() on an lm.object and if you set the parameter robust=T it gives you back Stata-like heteroscedasticity consistent standard errors.

summary(lm.object, robust=T)

You can find the function on https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r/

There is now a one-line solution using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).

> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, se_type = "stata")
> summary(lmro)

Call:
lm_robust(formula = mpg ~ hp, data = mtcars, se_type = "stata")

Standard error type:  HC1 

Coefficients:
            Estimate Std. Error  Pr(>|t|) CI Lower CI Upper DF
(Intercept) 30.09886    2.07661 4.348e-15 25.85785 34.33987 30
hp          -0.06823    0.01356 2.132e-05 -0.09592 -0.04053 30

Multiple R-squared:  0.6024 ,   Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

You can also get tidy output:

> tidy(lmro)
         term    estimate std.error      p.value    ci.lower
1 (Intercept) 30.09886054 2.0766149 4.347723e-15 25.85784704
2          hp -0.06822828 0.0135604 2.131785e-05 -0.09592231
     ci.upper df outcome
1 34.33987404 30     mpg
2 -0.04053425 30     mpg

The "stata" standard errors default to "HC1" standard errors, which are the default rob standard errors in Stata. You can also get "classical", "HC0", "HC1", "HC2", "HC3" and various clustered standard errors as well (including those that match Stata).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow