Question

For a school project, I've found myself playing with data from the Census Bureau's Current Population Survey. I've chosen SPSS to work with the data, because it seemed like the easiest piece of software to jump right into given my limited timeframe. Everything seems pretty straightforward, except for one operation that's giving me trouble.

For each case in my dataset--each case representing an individual surveyed--the following (relevant) variables are defined:

  • Household ID (HHID)--a number unique to each household surveyed
  • Person ID (PID)--a number unique to each person within the household
  • The person's age (AGE)
  • Whether or not the person received public health insurance--a 0 or 1 (HASHEALTH)
  • The person ID of the individual's father, if one exists in the household (0 if none exists) (POPNUM)
  • The person ID of the individual's mother, if one exists in the household (0 if none exists) (MOMNUM)

Here's the problem: I need to set the KIDHASHEALTH value of any given parent to the HASHEALTH value of the youngest person whose HHID and POPNUM or MOMNUM value match the HHID and PID of the current case--functionally, their youngest child.

So far, I've been unable to figure out how to do this using SPSS syntax. Can anybody think of a way to accomplish what I'm trying to do, with syntax or otherwise?

Many, many thanks in advance.

Edited with sample data:

HHID |PID |AGE |POPNUM |MOMNUM |HASHEALTH |KIDHASHEALTH
-----+----+----+-------+-------+----------+------------
1    |1   |45  |0      |0      |0         |0 //KIDHASHEALTH == 0 because
1    |2   |48  |0      |0      |0         |0 //youngest child's HASHEALTH == 0
1    |3   |13  |1      |2      |0         |0
2    |1   |33  |0      |0      |0         |1 // == 1 because youngest child's
2    |2   |28  |0      |0      |0         |1 // HASHEALTH == 1
2    |3   |15  |1      |2      |0         |0
2    |4   |12  |1      |2      |1         |0
-----+----+----+-------+-------+----------+------------
Was it helpful?

Solution

The code below was tested only on your small data snippet. So, no guarantees for all the data with their peculiarities. The code makes the assumption that AGE is integer.

*Let's add small fractional noise to those children AGE who HASHEALTH=1.
*In order to insert the info about health right into the age number. 
if hashealth age= age+rv.unif(-.1,+.1).

*Turn to fathers. Combine POPNUM and PID numbers in one column.
compute parent= popnum. /*Copy POPNUM as a new var PARENT.
if parent=0 parent= pid. /*and if the case is not a child, fill there PID.
*Now a father and his children have the same code in PARENT
*and so we can propagate the minimal age in that group (which is the age of the
*youngest child, provided the man has children) to all cases of the group,
*including the father.
aggregate /outfile= * mode= addvari
          /break= hhid parent /*breaking is done also by household, of course
          /youngage1= min(age). /*The variable showing that minimal age.
*Turn to mothers and do the same thing.
compute parent= momnum.
if parent=0 parent= pid.
aggregate /outfile= * mode= addvari
          /break= hhid parent
          /youngage2= min(age). /*The variable showing that minimal age.
*Take the minimal value from the two passes.
compute youngage= min(youngage1,youngage2).

*Compute binary KIDHASHEALTH variable.
*Remember that YOUNGAGE is not integer if that child has HASHEALTH=1.
compute kidhashealth= 0.
if popnum=0 and momnum=0 /*if we deal with a parent
   and age<>youngage /*and the youngage age listed is not their own
   and rnd(youngage)<>youngage kidhashealth= 1. /*and the age isn't integer, assign 1.
compute age= rnd(age). /*Restore integer age
exec.
delete vari parent youngage1 youngage2 youngage.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top