パネルデータのダブルクラスター化された標準エラー

https://stackoverflow.com/questions/8389843

28-10-2019
|

質問

R（時間と断面）にパネルデータセットがあり、残差が両方の方法で相関しているため、2次元でクラスター化された標準エラーを計算したいと考えています。周りのグーグルが見つかりました http://thetarzan.wordpress.com/2011/06/11/clustered-standard-errors-in-r/ これを行う機能を提供します。それは少しアドホックのように思えますので、テストされたパッケージがあるかどうかを知りたいと思いましたか？

知っている sandwich HAC標準エラーはありますが、二重のクラスタリング（つまり、2つの次元に沿った）は行いません。

解決

フランク・ハレルのパッケージ rms （以前は名前が付けられていました Design）クラスタリング時に頻繁に使用する関数があります。 robcov.

のこの部分を参照してください ?robcov, 、例えば。

cluster: a variable indicating groupings. ‘cluster’ may be any type of
      vector (factor, character, integer).  NAs are not allowed.
      Unique values of ‘cluster’ indicate possibly correlated
      groupings of observations. Note the data used in the fit and
      stored in ‘fit$x’ and ‘fit$y’ may have had observations
      containing missing values deleted. It is assumed that if any
      NAs were removed during the original model fitting, an
      ‘naresid’ function exists to restore NAs so that the rows of
      the score matrix coincide with ‘cluster’. If ‘cluster’ is
      omitted, it defaults to the integers 1,2,...,n to obtain the
      "sandwich" robust covariance matrix estimate.

他のヒント

パネル回帰の場合、 plm パッケージは、2つの次元に沿ってクラスター化されたSEを推定できます。

使用 M.ピーターセンのベンチマーク結果:

require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
    vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) - 
        vcovHC(x, method="white1", ...)
}

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))

これで、クラスター化されたSESを取得できます。

##Clustered by *group*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *time*
> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

##Clustered by *group* and *time*
> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

詳細については、以下を参照してください。

r in fama-macbethとcluster-robust（企業および時間）標準エラー.

ただし、上記はデータを強制できる場合にのみ機能します pdata.frame. 。持っていると失敗します "duplicate couples (time-id)". 。この場合、あなたは引き続きクラスター化することができますが、1つの次元に沿ってしかできません。

騙す plm 適切なパネルデータセットがあると考えるように 1 索引：

fpm.tr <- plm(y ~ x, test, model='pooling', index=c('firmid'))

これで、クラスター化されたSESを取得できます。

##Clustered by *group*
> coeftest(fpm.tr, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

この回避策を使用して、 高次元 またはaで より高いレベル （例えば industry また country）。ただし、その場合、あなたは group （また time) effects, 、これがアプローチの主な限界です。

パネルと他のタイプのデータの両方で機能する別のアプローチは multiwayvcov パッケージ。ダブルクラスタリングを可能にしますが、高次元でのクラスタリングも可能になります。パッケージのように Webサイト, 、それはアライのコードの改善です：

欠落のために観測の透明な取り扱いが低下しました

完全なマルチウェイ（またはn-way、またはn次元、または多次元）クラスタリング

Petersenデータを使用します cluster.vcov():

library("lmtest")
library("multiwayvcov")

data(petersen)
m1 <- lm(y ~ x, data = petersen)

coeftest(m1, vcov=function(x) cluster.vcov(x, petersen[ , c("firmid", "year")]))
## 
## t test of coefficients:
## 
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.029680   0.065066  0.4561   0.6483    
## x           1.034833   0.053561 19.3206   <2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Araiの関数は、標準エラーのクラスタリングに使用できます。彼は複数の次元でクラスタリングするための別のバージョンを持っています：

mcl <- function(dat,fm, cluster1, cluster2){
          attach(dat, warn.conflicts = F)
          library(sandwich);library(lmtest)
          cluster12 = paste(cluster1,cluster2, sep="")
          M1  <- length(unique(cluster1))
          M2  <- length(unique(cluster2))   
          M12 <- length(unique(cluster12))
          N   <- length(cluster1)          
          K   <- fm$rank             
          dfc1  <- (M1/(M1-1))*((N-1)/(N-K))  
          dfc2  <- (M2/(M2-1))*((N-1)/(N-K))  
          dfc12 <- (M12/(M12-1))*((N-1)/(N-K))  
          u1j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster1,  sum)) 
          u2j   <- apply(estfun(fm), 2, function(x) tapply(x, cluster2,  sum)) 
          u12j  <- apply(estfun(fm), 2, function(x) tapply(x, cluster12, sum)) 
          vc1   <-  dfc1*sandwich(fm, meat=crossprod(u1j)/N )
          vc2   <-  dfc2*sandwich(fm, meat=crossprod(u2j)/N )
          vc12  <- dfc12*sandwich(fm, meat=crossprod(u12j)/N)
          vcovMCL <- vc1 + vc2 - vc12
          coeftest(fm, vcovMCL)}

参照と使用例については、以下を参照してください。

rのクラスター化された標準誤差

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow