Simple but not optimal solution
Suppose diagnosis
is 1 when diagnosed (at most once per person) and 0 otherwise.
Then the time at diagnosis is at its simplest
egen time_diagnosis = total(diagnosis * year), by(id)
but you need to ignore any zeros. To spell that out,
replace time_diagnosis = . if time_diagnosis == 0
Better alternative
A more complicated but preferable alternative can handle multiple diagnoses if they occur:
egen time_diagnosis = min(year / diagnosis), by(id)
as year / diagnosis
is year
when diagnosis
is 1 and missing otherwise. This yields missing values if there is no diagnosis, which is as it should be.
Then you subtract that to get a new time variable.
gen time2 = time - time_diagnosis
In short, I think you can get this done in two statements, handling panel structure too.
Update
@Richard Herron asks why use egen
with by()
, and not just
gen time_diagnosis = time * diagnosis
A limitation of that is that the "correct" value is contained only in those observations for which diagnosis
is 1; that value still has to be "spread" to other values for the same id
. But that is precisely what egen
does here. In the simplest situation, with one diagnosis the total of time * diagnosis
is just time * 1
or time
, as any zeros make no difference to the sum.