I'm trying something that I thought would be rather simple in R, but is giving me more trouble then I bargained for. I'd like to use R to define spells based on multiple criteria, while ignoring missing data. The goal is then to compute wage means across spells using the aggregate
command. I suspect that the tools provided in the TraMineR
package may be used to accomplish this, but I'm having a hard time figuring out how.
For example, given the following data:
Caseid Year Unemployed EmployerID occID indID Wage
1 1999 0 1 1 1 5.00
1 2000 NA NA NA NA NA
1 2001 NA NA NA NA NA
1 2002 0 1 1 2 6.00
2 1999 0 1 1 1 4.00
2 2000 0 1 1 1 5.00
2 2001 0 1 1 1 6.00
2 2002 1 1 1 1 6.00
3 1999 0 1 1 1 4.00
3 2000 0 3 1 1 5.00
3 2001 0 1 4 1 5.00
3 2002 NA NA NA NA NA
4 1999 0 1 1 1 5.00
4 2000 0 1 1 1 5.00
4 2001 0 1 1 1 7.00
4 2002 0 1 1 1 7.00
I'd like to write code that defines spells based on changes in either employment status, employer, occupation, or industry. In addition I'd like to ignore missing values. Given that, the correct code should return the following vector for "Spell":
Caseid Year Unemployed EmployerID occID indID Wage Spell
1 1999 0 1 1 1 5.00 1
1 2000 NA NA NA NA NA 1
1 2001 NA NA NA NA NA 1
1 2002 0 1 1 2 6.00 2
2 1999 0 1 1 1 4.00 1
2 2000 0 1 1 1 5.00 1
2 2001 0 1 1 1 6.00 1
2 2002 1 1 1 1 6.00 2
3 1999 0 1 1 1 4.00 1
3 2000 0 3 1 1 5.00 2
3 2001 0 1 4 1 5.00 3
3 2002 NA NA NA NA NA 3
4 1999 0 1 1 1 5.00 1
4 2000 0 1 1 1 5.00 1
4 2001 0 1 1 1 7.00 1
4 2002 0 1 1 1 7.00 1
Ultimately I'd like to use the spell vector to average wages across within person spells. Returning the following:
Caseid Year Unemployed EmployerID occID indID Wage Spell avgWage
1 1999 0 1 1 1 5.00 1 5.00
1 2000 NA NA NA NA NA 1 5.00
1 2001 NA NA NA NA NA 1 5.00
1 2002 0 1 1 2 6.00 2 6.00
2 1999 0 1 1 1 4.00 1 5.00
2 2000 0 1 1 1 5.00 1 5.00
2 2001 0 1 1 1 6.00 1 5.00
2 2002 1 1 1 1 6.00 2 6.00
3 1999 0 1 1 1 4.00 1 4.00
3 2000 0 3 1 1 5.00 2 5.00
3 2001 0 1 4 1 5.00 3 5.00
3 2002 NA NA NA NA NA 3 5.00
4 1999 0 1 1 1 5.00 1 6.00
4 2000 0 1 1 1 5.00 1 6.00
4 2001 0 1 1 1 7.00 1 6.00
4 2002 0 1 1 1 7.00 1 6.00
Here is the data for Debugging. One note is that the newemp (newemployer) variable is different then the example I provided it should only start a new spell if the value is 1. So a series of 4 years where newemp=1 should not represent one spell but four different spells:
df <- as.data.frame(structure(list(caseid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), year = c(1979L,
1980L, 1981L, 1982L, 1983L, 1984L, 1985L, 1986L, 1987L, 1988L,
1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1996L, 1998L, 2000L,
2002L, 2004L, 2006L, 2008L, 2010L, 1979L, 1980L, 1981L, 1982L,
1983L, 1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L,
1992L, 1993L, 1994L, 1996L, 1998L, 2000L, 2002L, 2004L, 2006L,
2008L, 2010L, 1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L,
1986L, 1987L, 1988L, 1989L, 1990L, 1991L, 1992L, 1993L, 1994L,
1996L, 1998L, 2000L, 2002L, 2004L, 2006L, 2008L, 2010L, 1979L,
1980L, 1981L, 1982L, 1983L, 1984L, 1985L, 1986L, 1987L, 1988L,
1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1996L, 1998L, 2000L,
2002L, 2004L, 2006L, 2008L, 2010L, 1979L, 1980L, 1981L, 1982L,
1983L, 1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 1991L,
1992L, 1993L, 1994L, 1996L, 1998L, 2000L, 2002L, 2004L, 2006L,
2008L, 2010L), unemp = c(0, NA, 0, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, NA, 0, NA, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0,
0, NA, NA, NA, NA, 1, NA, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, NA, 0,
0, NA, 1, 0, NA, NA, NA, NA, NA, NA, NA, 0, 0, 1, 0, 0, NA, 0,
NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
newemp = c(NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, NA, NA, NA, NA, NA, 0,
1, 1, 1, NA, 1, NA, 0, NA, NA, NA, NA, NA, 1, 0, 0, 1, 0,
1, NA, NA, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 1, 0, 0, 1, NA, 0, 0, NA, 1, 0, NA, NA, NA, NA,
NA, NA, NA, 0, 0, 1, 1, 1, NA, 1, NA, 1, 0, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), stocc = c(335, NA, 337,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 337, 337, 337, 337, 337, 23, 386,
23, 23, 337, 389, 23, 337, 23, 337, NA, NA, NA, NA, NA, 276,
276, 276, NA, 383, 376, NA, 383, NA, NA, 383, NA, 447, 468,
155, 468, 373, 188, 243, NA, NA, 243, 22, 277, NA, 22, 469,
NA, NA, NA, NA, 274, NA, NA, NA, 313, 313, 313, 313, 313,
NA, 313, 313, NA, 313, 178, NA, NA, NA, NA, NA, NA, 329,
329, 329, 355, 223, 223, NA, 178, NA, 178, 178, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), stind = c(711, NA,
711, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 172, 172, 172, 172, 172,
172, 172, 172, 172, 172, 172, 172, 172, 172, 172, NA, NA,
NA, NA, NA, 641, 641, 641, NA, 700, 700, NA, 700, NA, NA,
700, NA, 840, 770, 842, 862, 172, 623, 682, NA, NA, 682,
172, 671, NA, 791, 791, NA, NA, NA, NA, 591, NA, NA, NA,
841, 841, 841, 841, 841, NA, 712, 841, NA, 841, 841, NA,
NA, NA, NA, NA, NA, 850, 850, 850, 932, 850, 841, NA, 841,
NA, 841, 841, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA), lwage = c(2.14335489273071, NA, 2.0160756111145,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 2.30358457565308, 2.34806489944458,
2.36942100524902, 2.41516351699829, 2.38407301902771, 2.23588967323303,
2.60783195495605, 2.58224511146545, 2.68043231964111, 2.70430994033813,
2.76339650154114, 2.76763892173767, 2.72537922859192, 2.83617949485779,
2.88961029052734, NA, NA, NA, NA, NA, 2.28949975967407, 2.15297079086304,
NA, NA, 2.25023865699768, 2.20731782913208, NA, 2.15908432006836,
NA, NA, 2.17475175857544, NA, 0.0605304837226868, 0.940007209777832,
2.2104697227478, 2.22159194946289, 0.130852773785591, 0.725372314453125,
2.02960777282715, NA, NA, 2.09433007240295, 2.38683438301086,
NA, NA, NA, NA, NA, NA, NA, NA, 1.89671993255615, NA, NA,
NA, 2.63665437698364, 2.7040421962738, 2.79728126525879,
2.72129535675049, 3.03042364120483, NA, 3.02664947509766,
2.7957558631897, NA, 2.86539578437805, 2.20382499694824,
NA, NA, NA, NA, NA, NA, 2.08691358566284, 2.03152418136597,
2.10608339309692, 2.17702174186707, 2.16355276107788, 3.65519332885742,
NA, 3.80884671211243, NA, 3.37032580375671, 3.52329707145691,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA))))