Question

I would like to create a local macro for a subset of my dataset to use for future regressions (see Some Uses for Macros Outside of Loops section).

I've started off with code that is along the following lines:

quietly reg y x1 x2 x3
local subset if e(sample)
list Unit `subset'
reg y x1 x2 if `subset'

x3 has missing values, so some observations are excluded in the first reg command. The output from the list command does indicate that contents of the macro are indeed what I want (Unit is a variable that identifies the observation).

Nevertheless, I receive an error message after the last command:

if not found
r(111);

From the information on r(111):

__________ not found;
no variables defined;
The variable does not exist. You may have mistyped the variable's name.

What is wrong with my syntax? That is, why is Stata treating if as a variable?

Was it helpful?

Solution

Given your definition the text if is part of the macro contents.

quietly reg y x1 x2 x3
local subset if e(sample)
list Unit `subset'
reg y x1 x2 if `subset'

So the list command works because it is interpreted as

list Unit if e(sample) 

but the regress command is not working because it is interpreted as

regress y x1 x2 if if e(sample) 

and Stata is puzzled out of its mind by the second if.

That's a comparatively minor deal. The bigger deal is that absolutely all you are doing is putting the text if e(sample) into the local macro subset and saving yourself a few characters in typing. That is fragile because, come the next estimation command, with possibly a different estimation sample, the local macro won't have the same implication. There is a better way to keep track securely of the estimation sample, which is to create an indicator immediately after model estimation by e.g.

gen byte regsample = e(sample) 

and then if regsample is guaranteed to select precisely the same subset (including all the observations whenever they were all used).

OTHER TIPS

Your immediate problem is that you have a double if. The local macro `subset' contains the string "if e(sample)", so when Stata is interpreting the line:

reg y x1 x2 if `subset'

it reads:

reg y x1 x2 if if e(sample)

The more important problem is that this method is very fragile as the contents of e(sample) will be overwritten by every estimation command. It is probably safer to do something like this:

quietly reg y x1 x2 x3
gen byte touse = e(sample)
reg y x1 x2 if touse

This will create a variable, which will not be overwritten by future estimation commands, that contains a 1 when you want to use that observation (hence the name) and 0 when you don't want to use that observation. Since 1s are treated as "true" and 0s as "false", the statement if touse selects the observations you want to use.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top