Question

I am attempting to generate a dummy variable for each year from 1996 to 2012 (inclusive) such that the 1996 dummy should equal 1 if it is 1996 and 0 if else using the foreach command in Stata to cut down on time (at least for future projects). What is currently happening is that the dummy for 1996 is being produced, but no others are generated. I think that it has to do with how I am defining j, but I cannot quite figure out the formatting to achieve the results that I want. I have looked online and in the Stata help files and cannot find anything on this specific topic.

Here is what I have thus far:

local var year
local j = 1996
foreach j of var year {
    gen d`j' = 1 if year==`j'
    local ++j
}

I will continue to try and figure this out on my own, but if anyone has a suggestion I would be greatly appreciative.

Was it helpful?

Solution

Let us look at this line by line.

local var year

You defined a local macro var with content "year". This is legal but you never refer to that local macro in this code, so the definition is pointless.

local j = 1996

You defined a local macro j with content "1996". This is legal.

foreach j of var year {

You open a loop and define the loop index to be j. That means that within the loop any reference to local macro j will be interpreted in terms of the list of arguments you provide. (The previous definition of j is irrelevant within the loop, and so has no effect in the rest of your code.)

... of var year 

You specify that the loop is over a variable list here. Note that the keyword var here is short for varlist and has absolutely nothing to do the local macro name var you just defined. The variable list consists of the single variable name year.

gen d`j' = 1 if year==`j'

This statement will be interpreted, the one and only time the loop is executed, as

gen dyear = 1 if year==year 

as references to the local macro j are replaced with its contents, the variable name year. year==year is true for every observation. The effect is a new variable dyear which is 1 in every observation. That is not an indicator or dummy variable as you want it. If you look at your dataset carefully, you will see that is not a dummy variable for year being 1996.

local ++j

You are trying to increment the local macro j by 1. But you just set local macro j to contain the string "year", which is a variable name. But you can't add 1 to a string, and so the error message will be type mismatch. You don't report that error, which is a surprise. It is a little subtle, as in the previous command the context of generate allows interpretation of the reference to year as an instruction to calculate with the variable year, which is naturally numeric. But local commands are all about string manipulation, which may or may not have numeric interpretation, and your command is equivalent, first of all, to instructing Stata to add

"year" + 1 

which triggers a type mismatch error.

Turning away from your code: Consider a loop

forval y = 1996/2012 { 
    gen d`y' = 1 if year == `y'
} 

This is closer to what you want but makes clearer another bug in your code. This would create variables d1996 to d2012 but each will be 1 in the year specified but missing otherwise, which is not what you want. You could fix that by adding a further line in the loop

    replace d`y' = 0 if year != `y' 

but a much cleaner way to do it is the single line

    gen d`y' = year == `y' 

The expression

               year == `y' 

is evaluated as 1 when true and 0 when false, which is what you want.

All this is standard technique documented in [U] or [P].

As @Roberto Ferrer pointed out, however, experienced Stata users would not define dummies this way, as tabulate offers an option to do it without a loop.

A tutorial that brings together comments on local macros, foreach and forvalues loops is within http://www.stata-journal.com/sjpdf.html?articlenum=pr0005

 search foreach 

within Stata would have pointed to that as one of various pieces you can read.

OTHER TIPS

Looping is not necessary. Try the tabulate command with the gen() option. See help tabulate oneway.

See also help xi and help factor variables.

You are trying to loop through the distinct values of year but the syntax is not correct. You are actually looping through a list of variables with only one element: year. The command levelsof gives you the distinct values, but like I said, looping is not necessary.

Maybe this might help.

/*assuming the data is from 1970-2012*/
/*assuming your year variable name is fyear*/

forvalues x=1970/2012 {  
gen fyear `x'=0
replace fyear `x'=1 if fyear==`x' 

}

However, I do agree with Roberto Ferrer that loop may not be necessary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top