Errores estándar de doble agrupación para los datos del panel

https://stackoverflow.com/questions/8389843

28-10-2019
|

Pregunta

Tengo un conjunto de datos de panel en R (tiempo y sección transversal) y me gustaría calcular los errores estándar que están agrupados por dos dimensiones, porque mis residuos están correlacionados en ambos sentidos. Buscando en Google que encontré http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ que proporciona una función para hacer esto. Parece un poco ad-hoc, así que quería saber si hay un paquete que se ha probado y ¿esto hace esto?

lo sé sandwich Hace HAC Errores estándar, pero no hace doble agrupación (es decir, a lo largo de dos dimensiones).

Solución

Paquete de Frank Harrell rms (que solía ser nombrado Design) tiene una función que uso a menudo al agrupar: robcov.

Ver esta parte de ?robcov, por ejemplo.

cluster: a variable indicating groupings. ‘cluster’ may be any type of
      vector (factor, character, integer).  NAs are not allowed.
      Unique values of ‘cluster’ indicate possibly correlated
      groupings of observations. Note the data used in the fit and
      stored in ‘fit$x’ and ‘fit$y’ may have had observations
      containing missing values deleted. It is assumed that if any
      NAs were removed during the original model fitting, an
      ‘naresid’ function exists to restore NAs so that the rows of
      the score matrix coincide with ‘cluster’. If ‘cluster’ is
      omitted, it defaults to the integers 1,2,...,n to obtain the
      "sandwich" robust covariance matrix estimate.

Otros consejos

Para la regresión del panel, el plm El paquete puede estimar el SES agrupado a lo largo de dos dimensiones.

Usando Resultados de referencia de M. Petersen:

require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
    vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) - 
        vcovHC(x, method="white1", ...)
}

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))

Para que ahora puedas obtener SES agrupado:

##Clustered by *group*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *time*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *group* and *time*
> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Para más detalles, ver:

Fama-Macbeth y Cluster-Robust (por firme y tiempo) errores estándar en R.

Sin embargo, lo anterior solo funciona si sus datos pueden coaccionar a un pdata.frame. Fallará si tienes "duplicate couples (time-id)". En este caso, aún puede agruparse, pero solo a lo largo de una dimensión.

Truco plm pensando que tiene un conjunto de datos de panel adecuado especificando solo una índice:

fpm.tr <- plm(y ~ x, test, model='pooling', index=c('firmid'))

Para que ahora puedas obtener SES agrupado:

##Clustered by *group*
> coeftest(fpm.tr, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

También puede usar esta solución para agruparse por un dimensión superior o en un nivel más alto (p.ej industry o country). Sin embargo, en ese caso no podrá usar el group (o time) effects, que es el límite principal del enfoque.

Otro enfoque que funciona tanto para el panel como para otros tipos de datos es el multiwayvcov paquete. Permite una doble agrupación, pero también se agrupa a dimensiones más altas. Según los paquetes sitio web, es una mejora en el código de Arai:

El manejo transparente de las observaciones cayó debido a la falta

Agrupación multimensiva (o camino n, o n-dimensional o multidimensional)

Utilizando los datos de Petersen y cluster.vcov():

library("lmtest")
library("multiwayvcov")

data(petersen)
m1 <- lm(y ~ x, data = petersen)

coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid", "year")]))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.029680   0.065066  0.4561   0.6483    
## x           1.034833   0.053561 19.3206   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

La función de ARAI se puede utilizar para agrupar errores estándar. Tiene otra versión para agrupar en múltiples dimensiones:

mcl <- function(dat,fm, cluster1, cluster2){
          attach(dat, warn.conflicts = F)
          library(sandwich);library(lmtest)
          cluster12 = paste(cluster1,cluster2, sep="")
          M1  <- length(unique(cluster1))
          M2  <- length(unique(cluster2))   
          M12 <- length(unique(cluster12))
          N   <- length(cluster1)          
          K   <- fm$rank             
          dfc1  <- (M1/(M1-1))*((N-1)/(N-K))  
          dfc2  <- (M2/(M2-1))*((N-1)/(N-K))  
          dfc12 <- (M12/(M12-1))*((N-1)/(N-K))  
          u1j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster1,  sum)) 
          u2j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster2,  sum)) 
          u12j  <- apply(estfun(fm), 2, function(x) tapply(x, cluster12, sum)) 
          vc1   <-  dfc1*sandwich(fm, meat=crossprod(u1j)/N )
          vc2   <-  dfc2*sandwich(fm, meat=crossprod(u2j)/N )
          vc12  <- dfc12*sandwich(fm, meat=crossprod(u12j)/N)
          vcovMCL <- vc1 + vc2 - vc12
          coeftest(fm, vcovMCL)}

Para referencias y ejemplo de uso, ver:

Errores estándar agrupados en R

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow