Question

I have a panel of data (firm-years) that span several countries. For each country I estimate a logit model using the first five years then I use this model to predict probabilities in subsequent years. I foreach loop over the countries and forvalues loop over the subsequent years.

The first few countries work well (both estimations and predictions), but the fifth country's first out-of-sample prediction fails with:

Country: United Kingdom
Year: 1994
too many variables specified
r(103);

The model fits and 1994 has enough data to predict a probability. My predict call is:

predict temp_`c'`y' ///
    if (country == "`c'") ///
        & (fyear == `y'), ///
    pr

Do you have any ideas what could cause this error? I am confused because logit and predict work elsewhere in the same loop. Thanks!

FWIW, here's the .do file.

* generate table 5 from Denis and Osobov (2008 JFE)
preserve

* loop to estimate model by country
levelsof country, local(countries)
foreach c of local countries {
    display "Country: `c'"
    summarize fyear if (country == "`c'"), meanonly
    local est_low = `r(min)'
    local est_high = `=`r(min)' + 4'
    local pred_low = `=`r(min)' + 5'
    local pred_high = `r(max)'
    logit payer size v_a_tr e_a_tr re_be_tr ///
        if (country == "`c'") ///
            & inrange(fyear, `est_low', `est_high')
    forvalues y = `pred_low'/`pred_high' {
        display "Country: `c'"
        display "Year: `y'"
        predict temp_`c'`y' ///
            if (country == "`c'") ///
                & (fyear == `y'), ///
            pr
    }
}

* combine fitted values and generate delta
egen payer_expected = rowfirst(temp_*)
drop temp_*
generate delta = payer - payer_expected

* table
table country fyear, ///
    contents(count payer mean payer mean payer_expected)

*
restore    

Update: If I drop (country == "United Kingdom"), then the same problem shifts to the United States (next and last country in panel). If I drop inlist(country, "United Kingdom", "United States") then the problem disappears and the .do file runs through.

Was it helpful?

Solution

You are using country names as part of the new variable name that predict is creating. However, when you get to "United Kingdom" your line

predict temp_`c'`y'

implies something like

predict temp_United Kingdom1812 

But Stata sees that as two variable names where only one is allowed.

Otherwise put, you are being bitten by a simple rule: Stata does not allow spaces within variable names.

Clearly the same problem would bite with "United States".

The simplest fudge is to change the values so that spaces become underscores "_". Stata's OK with variable names including underscores. That could be

gen country2 = subinstr(country, " ", "_", .) 

followed by a loop over country2.

Note for everyone not up in historical details. 1812 is the year that British troops burnt down the White House. Feel free to substitute "1776" or some other date of choice.

(By the way, credit for a crystal-clear question!)

OTHER TIPS

Here's an another approach to your problem. Initialise your variable to hold predicted values. Then as you loop over the possibilities, replace it chunk by chunk with each set of predictions. That avoids the whole business of generating a bunch of variables with different names which you don't want to hold on to long-term.

* generate table 5 from Denis and Osobov (2008 JFE)

preserve
gen payer_expected = . 

* loop to estimate model by country
levelsof country, local(countries)
foreach c of local countries {
    display "Country: `c'"
    summarize fyear if (country == "`c'"), meanonly
    local est_low = `r(min)'
    local est_high = `=`r(min)' + 4'
    local pred_low = `=`r(min)' + 5'
    local pred_high = `r(max)'
    logit payer size v_a_tr e_a_tr re_be_tr ///
       if (country == "`c'") ///
       & inrange(fyear, `est_low', `est_high')
    forvalues y = `pred_low'/`pred_high' {
        display "Country: `c'"
        display "Year: `y'"
        predict temp ///
            if (country == "`c'") ///
            & (fyear == `y'), pr
        quietly replace payer_expected = temp if temp < . 
        drop temp 
   }
}

generate delta = payer - payer_expected

* table
table country fyear, ///
     contents(count payer mean payer mean payer_expected)

*
restore    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top