Question

I have a dataset in which a household id (hhid) and a member id (mid) identify a unique person. I have results from two separate surveys taken a year apart (surveyYear). I also have data on whether or not the individual was enrolled in school at the time.

I want a binary variable which signifies if the individual in question dropped out of school between the surveys (i.e. 1 if dropped and 0 if still in school)

I have a decent understanding of Stata but this coding challenge seems a little beyond me because I am not sure how to compare the in-school status of the later id with the earlier id and then propagate that result into a binary column.

Here is an example of what I need

Previously:

     +----------------------------------+
     | hhid   mid   survey~r   inschool |
     |----------------------------------|
  1. |    1     2          3          1 |
  2. |    1     2          4          1 |
  3. |    1     3          3          1 |
  4. |    1     3          4          1 |
  5. |    2     1          3          1 |
  6. |    2     1          4          0 |
  7. |    2     2          3          0 |
  8. |    2     2          4          0 |
     +----------------------------------+

After:

     +--------------------------------------------+
     | hhid   mid   survey~r   inschool   dropped |
     |--------------------------------------------|
  1. |    1     2          3          1         0 |
  2. |    1     2          4          1         0 |
  3. |    1     3          3          1         0 |
  4. |    1     3          4          1         0 |
  5. |    2     1          3          1         1 |
  6. |    2     1          4          0         1 |
  7. |    2     2          3          0         0 |
  8. |    2     2          4          0         0 |
     +--------------------------------------------+
Was it helpful?

Solution

bysort hhid mid (surveyyear) : gen dropped = inschool[1] == 1 & inschool[2] == 0

The commentary is longer than the code:

  1. Within blocks of observations with the same hhid and mid, sort by surveyyear.

  2. You want students who were inschool in year 3 but not in year 4. So, inschool is 1 in the first observation and 0 in the second.

  3. Here subscripting [1] and [2] refers to order within blocks of observations defined by the by: statement.

If further detail is needed see e.g. this article. Note that contrary to one tag, no loop is needed (or, if you wish, that the loop over possibilities is built in to the by: framework).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top