Question

I have a panel data set for which I would like to calculate moving averages across years.

Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period. For example:

P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)

I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.

I'd appreciate any guidance!

Was it helpful?

Solution

This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.

 reshape long v, i(id) j(year) 
 tsset id year 

Then a moving average is easy. Use tssmooth or just generate, e.g.

 gen mave = (L.v + v + F.v)/3 

or (better)

 gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v 

More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.

EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.

forval j = 1914/2011 { 
    local i = `j' - 1 
    local k = `j' + 1       
    gen P`j' = (v`i' + v`j' + v`k') / 3 
} 

That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.

FURTHER EDIT

As a matter of completeness, note that it is easy to handle missings without resorting to egen.

The numerator

    (v`i' + v`j' + v`k')

generalises to

    (cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k') 

and the denominator

    3 

generalises to

    !missing(v`i') + !missing(v`j') + !missing(v`k')

If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.

OTHER TIPS

There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm

xtset id time 
mvsumm observations, stat(mean) win(t) gen(new_variable) end
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top