Question

I'm trying to run a panel regression with over 11,000 dummy interaction terms. My regression looks like this:

xi: reg Y i.county*i.year

where i.county*i.year represents interaction of dummy variables. Neither Stata, nor Matlab, nor R will hold this many variables. I'm not sure if there's a command to increase the number of variables stored (e.g. a -set matsize- command in stata) that I am missing.

I do know that the max capacity for Stata matrices is 11,000 variables. How can I run this fixed-effects regression in Stata? Is Mata an option here?

Was it helpful?

Solution

If you have no other regressors then Richard Herron's suggestion to use collapse in the comment is probably the best way to do this. If you do have other regressors then your model is just a fixed effects model where your groupping variable is just country-year. You can than estimate your model by typing

egen id = group(country year)
xtset id
xtreg y x1 x2, fe

alternatively:

egen id = group(country year)
areg y x1 x2, absorb(id)

The difference between these two is discussed in the helpfile of areg. The relevant section is "areg is designed for datasets with many groups, but not a number of groups that increases with the sample size. See the xtreg, fe command for an estimator that handles the case in which the number of groups increases with the sample size."

OTHER TIPS

Any reason why you can't use a random effects model here? Stata/SE allows you to increase the maximum number of variables (set maxvar) but still, a regression model with 11,000 fixed effects and an interaction term will likely blow the top off of your computer...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top