Statsmodels Plotting mean confidence intervals based on heteroscedastic consistent standard errors

StackOverflow https://stackoverflow.com/questions/21397082

  •  03-10-2022
  •  | 
  •  

Pregunta

This question is similar to confidence and prediction intervals with StatsModels but with an added nuance:

My data is heteroscedastic and I would like to plot the confidence interval on the mean using any one of the heteroscedastic consistent standard errors that statsmodels provides (HC0_se, HC1_se, etc.). I can't find any easy access to this information for each fitted value (though it's quite easy to get the intervals for each coefficient). It also does not seem to be contained in the results summary table in stats.outliers in the same way that the standard mean confidence interval data is.

Two questions:

  1. Does anyone have any idea how I can do this?
  2. What does one typically use the heteroscedastic-consistent covariance matrices for that are also available in the linear regression results object? Why is that made available?

Many thanks

¿Fue útil?

Solución 2

Robust standard errors or covariances are not yet fully integrated into the models. They are currently mainly add-ons to get them after the model is estimated.

We will be able to change default covariance to any of the available robust covariance estimators in the next release of statsmodels and is already in current master for OLS. Then all additional results, t_test, wald_test and so on, will be using the robust or nonrobust covariance that has been defined as default. current version: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html

For the prediction standard errors:

I think the calculations are the same when cov_params is a robust sandwich estimator, but I haven't verified that against Stata. see the last part of my answer in Mathematical background of statsmodels wls_prediction_std

So in statsmodels 0.5 it's not possible to get the prediction errors with robust covariances directly, you need to copy the function to use the desired cov_params.

Why do we use robust covariances

If there is heteroscedasticity or correlation of observations, then OLS has consistent or unbiased parameter estimates, but the standard covariance matrix of the parameter estimates is "wrong". So we need to get a covariance matrix that is robust to heteroscedasticity, correlation or both.

Many modern econometrics textbooks recommend to always use robust covariance estimators, when we are not sure about the correct specification of heteroscedasticity or correlation across observations. Which is almost always the case in economics.

The simplest case is just heteroscedasticity http://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors but in timeseries we might have autocorrelation that we did not include in the model, in repeated measures or panel data we often have correlation within clusters or panels. Robust covariances give us consistent standard errors in these cases.

The same can apply to other models, for example cluster robust standard errors in Poisson or Logit model in generalized estimating equations (GEE).

Otros consejos

I don't believe there's a way to specify which covariance matrix you want to use for calculation of prediction standard errors yet. Note that the prediction code is still in the "sandbox" folder in the statsmodels repository. I'm sure Github pull requests would be welcome :)

In any case, this should be pretty simple to do. Here's a link to the under-the-hood code for the prediction function that you linked to. Essentially, you would just need to substitute the covariance matrix you want to use instead of the covb variable.

Then, you can use he same matplotlib tidbit you saw in the other SO post.

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/predstd.py#L27

predvar = res.mse_resid/weights + (exog * np.dot(covb, exog.T).T).sum(1)
predstd = np.sqrt(predvar)
tppf = stats.t.isf(alpha/2., res.df_resid)
interval_u = predicted + tppf * predstd
interval_l = predicted - tppf * predstd
return predstd, interval_l, interval_u
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top