Errori standard a doppio cluster per i dati del pannello

https://stackoverflow.com/questions/8389843

28-10-2019
|

Domanda

Ho un set di dati del pannello in R (tempo e sezione trasversale) e vorrei calcolare errori standard raggruppati da due dimensioni, perché i miei residui sono correlati in entrambi i modi. Googling in giro che ho trovato http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-re che fornisce una funzione per farlo. Sembra un po 'ad hoc, quindi volevo sapere se c'è un pacchetto che è stato testato e lo fa?

lo so sandwich Fa errori standard HAC, ma non fa doppio clustering (cioè lungo due dimensioni).

Soluzione

Il pacchetto di Frank Harrell rms (che era nominato Design) ha una funzione che uso spesso durante il cluster: robcov.

Vedere questa parte di ?robcov, Per esempio.

cluster: a variable indicating groupings. ‘cluster’ may be any type of
      vector (factor, character, integer).  NAs are not allowed.
      Unique values of ‘cluster’ indicate possibly correlated
      groupings of observations. Note the data used in the fit and
      stored in ‘fit$x’ and ‘fit$y’ may have had observations
      containing missing values deleted. It is assumed that if any
      NAs were removed during the original model fitting, an
      ‘naresid’ function exists to restore NAs so that the rows of
      the score matrix coincide with ‘cluster’. If ‘cluster’ is
      omitted, it defaults to the integers 1,2,...,n to obtain the
      "sandwich" robust covariance matrix estimate.

Altri suggerimenti

Per le regressioni del pannello, il plm Il pacchetto può stimare SES cluster lungo due dimensioni.

Usando Risultati di riferimento di M. Petersen:

require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
    vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) - 
        vcovHC(x, method="white1", ...)
}

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))

In modo che ora puoi ottenere SES cluster:

##Clustered by *group*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *time*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *group* and *time*
> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Per maggiori dettagli vedi:

Fama-macbeth e cluster-robust (per azienda e tempo) errori standard in r.

Tuttavia, quanto sopra funziona solo se i tuoi dati possono essere costretti a pdata.frame. Fallirà se hai "duplicate couples (time-id)". In questo caso puoi ancora raggrupparsi, ma solo lungo una dimensione.

Trucco plm nel pensare di avere un set di dati del pannello adeguato solo specificando uno indice:

fpm.tr <- plm(y ~ x, test, model='pooling', index=c('firmid'))

In modo che ora puoi ottenere SES cluster:

##Clustered by *group*
> coeftest(fpm.tr, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Puoi anche usare questa soluzione alternativa per cluster da a dimensione più alta o a a livello più alto (per esempio industry o country). Tuttavia, in tal caso non sarai in grado di utilizzare il group (o time) effects, che è il limite principale dell'approccio.

Un altro approccio che funziona sia per il pannello che per altri tipi di dati è il multiwayvcov pacchetto. Consente un doppio clustering, ma anche il clustering a dimensioni più elevate. Secondo i pacchetti sito web, è un miglioramento per il codice di Arai:

Gestione trasparente delle osservazioni diminuite a causa della mancanza

Clustering completo multi-way (o n-way o n-dimensionale o multidimensionale)

Utilizzando i dati di Petersen e cluster.vcov():

library("lmtest")
library("multiwayvcov")

data(petersen)
m1 <- lm(y ~ x, data = petersen)

coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid", "year")]))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.029680   0.065066  0.4561   0.6483    
## x           1.034833   0.053561 19.3206   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

La funzione di Arai può essere utilizzata per gli errori standard di clustering. Ha un'altra versione per il clustering in più dimensioni:

mcl <- function(dat,fm, cluster1, cluster2){
          attach(dat, warn.conflicts = F)
          library(sandwich);library(lmtest)
          cluster12 = paste(cluster1,cluster2, sep="")
          M1  <- length(unique(cluster1))
          M2  <- length(unique(cluster2))   
          M12 <- length(unique(cluster12))
          N   <- length(cluster1)          
          K   <- fm$rank             
          dfc1  <- (M1/(M1-1))*((N-1)/(N-K))  
          dfc2  <- (M2/(M2-1))*((N-1)/(N-K))  
          dfc12 <- (M12/(M12-1))*((N-1)/(N-K))  
          u1j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster1,  sum)) 
          u2j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster2,  sum)) 
          u12j  <- apply(estfun(fm), 2, function(x) tapply(x, cluster12, sum)) 
          vc1   <-  dfc1*sandwich(fm, meat=crossprod(u1j)/N )
          vc2   <-  dfc2*sandwich(fm, meat=crossprod(u2j)/N )
          vc12  <- dfc12*sandwich(fm, meat=crossprod(u12j)/N)
          vcovMCL <- vc1 + vc2 - vc12
          coeftest(fm, vcovMCL)}

Per i riferimenti e l'esempio di utilizzo consultare:

Errori standard raggruppati in r

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow