Pregunta

I am pretty new to Stata and I am having difficulty doing something which I would guess is not that unusual of a thing to try to do. I am working with a panel data set (countries and times). Each observation consists of a country, a year, and a variable, call it x. The data is sorted by country year (i.e. all observations corresponding to a given country are consecutive and sorted by year).

Each country has 54 years of data corresponding to 1960 to 2013 inclusive. I would like to run a t-test something like in the following way:

by country: ttest x = x[54] if year != 2013

But I get an error ("weights not allowed") which I don't know how to interpret. I could do it by hardcoding it in and using the usual syntax

by country: ttest x = # if year != 2013

but I want to avoid hard-coding since there are >100 countries and I want to be able to flexibly add / remove countries (and this is just poor form in general).

My first thought was to define a macro using something like

levelsof country, local(levels)
foreach c of local levels {
    local y x if year == 2013
    ttest x = y if year != 2013
    // some code to store the value that I haven't figured out yet
}

but you can't use "if" with declaring a local macro. I am pretty lost and would appreciate any help you all can give. Thank you!

¿Fue útil?

Solución

Student's t tests here make little sense without adjustment for time and space dependence structure, unless you have grounds for treating your data as equivalent to independent draws from the same distribution. You can do the tests, but standard errors and P-values are dubious if not bogus. That is, your individual tests on time series face one problem; and collectively your tests face another problem. For a good account, see either edition of Box, Hunter, Hunter, Statistics for experimenters. John Wiley.

That large point aside, Stata is choking on the [] which are being misread as an attempt to specify weights. My guess is that

by country: ttest x = `=x[54]' if year != 2013  

would be acceptable syntax to Stata, although still dubious statistics. The detail here is the macro-like syntax

`=  ' 

which has the effect that the expression given will be evaluated by Stata before the line is passed to ttest. So the result, a numeric value, will be what the ttest command sees.

This is naturally similar in spirit to what you were imagining, although your code is some way from being legal and correct.

UPDATE This calculation may also be helpful:

egen mean = mean(x / (year != 2013)), by(country) 
egen sd = sd(x / (year != 2013), by(country) 
gen z = (x - mean) / sd if year == 2013 
list country x z if year == 2013 
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top