Local macros for subset of observations

Question 1

Given your definition the text if is part of the macro contents.

quietly reg y x1 x2 x3
local subset if e(sample)
list Unit `subset'
reg y x1 x2 if `subset'

So the list command works because it is interpreted as

list Unit if e(sample)

but the regress command is not working because it is interpreted as

regress y x1 x2 if if e(sample)

and Stata is puzzled out of its mind by the second if.

That's a comparatively minor deal. The bigger deal is that absolutely all you are doing is putting the text if e(sample) into the local macro subset and saving yourself a few characters in typing. That is fragile because, come the next estimation command, with possibly a different estimation sample, the local macro won't have the same implication. There is a better way to keep track securely of the estimation sample, which is to create an indicator immediately after model estimation by e.g.

gen byte regsample = e(sample)

and then if regsample is guaranteed to select precisely the same subset (including all the observations whenever they were all used).

Question 2

Your immediate problem is that you have a double if. The local macro `subset' contains the string "if e(sample)", so when Stata is interpreting the line:

reg y x1 x2 if `subset'

it reads:

reg y x1 x2 if if e(sample)

The more important problem is that this method is very fragile as the contents of e(sample) will be overwritten by every estimation command. It is probably safer to do something like this:

quietly reg y x1 x2 x3
gen byte touse = e(sample)
reg y x1 x2 if touse

This will create a variable, which will not be overwritten by future estimation commands, that contains a 1 when you want to use that observation (hence the name) and 0 when you don't want to use that observation. Since 1s are treated as "true" and 0s as "false", the statement if touse selects the observations you want to use.