Question

Background: I'm using some Census public use microdata samples (the American Community Survey in particular) across several years to examine the behavior of people who have completed different degrees (e.g., high school diploma, bachelor's degree, master's degree). The variable with that public use file is called "Schooling". The problem is that the codes that are contained within the variable "Schooling" have changed from year to year. For example, for the files up through 2007, a value of "13" reflects completing a bachelor's degree, but starting in 2008 the value changes to "21" when someone has completed their bachelor's degree.

Goal: To create a new "Degree Competed" variable that translates the "Schooling" codes to reflect the degree level completed, taking into account the year of the file. Logistics: The files for all years have been concatenated and, for review purposes, I have to work with the file as is rather than correcting it before it gets to this point.

Existing Code: Here is what I tried.

if      (original.file$year %in% c(2000,2001)) {
    if      (original.file$Schooling <= 08) {original.file$degree.completed <- 0}
    else if (original.file$Schooling <= 10) {original.file$degree.completed <- 1}
    else if (original.file$Schooling <= 12) {original.file$degree.completed <- 2}
    else if (original.file$Schooling == 13) {original.file$degree.completed <- 3}
    else if (original.file$Schooling == 14) {original.file$degree.completed <- 4}
    else if (original.file$Schooling == 15) {original.file$degree.completed <- 5}
    else if (original.file$Schooling == 16) {original.file$degree.completed <- 6}
    }
else if (original.file$year %in% c(2002,2003,2004,2005,2006,2007)) {
    if      (original.file$Schooling <= 08) {original.file$degree.completed <- 0}
    else if (original.file$Schooling <= 11) {original.file$degree.completed <- 1}
    else if (original.file$Schooling == 12) {original.file$degree.completed <- 2}
    else if (original.file$Schooling == 13) {original.file$degree.completed <- 3}
    else if (original.file$Schooling == 14) {original.file$degree.completed <- 4}
    else if (original.file$Schooling == 15) {original.file$degree.completed <- 5}
    else if (original.file$Schooling == 16) {original.file$degree.completed <- 6}
    }
else if (original.file$year %in% c(2008,2009,2010,2011)) {
    if      (original.file$Schooling <= 15) {original.file$degree.completed <- 0}
    else if (original.file$Schooling <= 19) {original.file$degree.completed <- 1}
    else if (original.file$Schooling == 20) {original.file$degree.completed <- 2}
    else if (original.file$Schooling == 21) {original.file$degree.completed <- 3}
    else if (original.file$Schooling == 22) {original.file$degree.completed <- 4}
    else if (original.file$Schooling == 23) {original.file$degree.completed <- 5}
    else if (original.file$Schooling == 24) {original.file$degree.completed <- 6}
    }

Problem: I get the following warning messages of this type.

Warning messages:

1: In if (original.file$year %in% c(2000, 2001)) { : the condition has length > 1 and only the first element will be used

2: In if (original.file$Schooling <= 8) { : the condition has length > 1 and only the first element will be used

3: In if (original.file$Schooling <= 10) { : the condition has length > 1 and only the first element will be used

Question: I know that there is a vector vs scalar issue here with the "if", as I've seen from other questions on StackOverflow, but the answers do not seem to apply to this situation. What is the solution here?

Was it helpful?

Solution

First, use cut or a table instead of all those if's and else's:

CutOffs1 <- c(0,8,10,12,13,14,15,16)
CutOffs2 <- c(0,8,11,12,13,14,15,16)
CutOffs3 <- c(0,15,19,20,21,22,23,24)
CutOffs <- cbind(CutOffs1, CutOffs2, CutOffs3)
MyTable <- apply(CutOffs, 2, function(X) cut(1:24, X, FALSE)-1)

      CutOffs1 CutOffs2 CutOffs3
 [1,]        0        0        0
 [2,]        0        0        0
 [3,]        0        0        0
 [4,]        0        0        0
 [5,]        0        0        0
 [6,]        0        0        0
 [7,]        0        0        0
 [8,]        0        0        0
 [9,]        1        1        0
[10,]        1        1        0
[11,]        2        1        0
[12,]        2        2        0
[13,]        3        3        0
[14,]        4        4        0
[15,]        5        5        0
[16,]        6        6        1
[17,]       NA       NA        1
[18,]       NA       NA        1
[19,]       NA       NA        1
[20,]       NA       NA        2
[21,]       NA       NA        3
[22,]       NA       NA        4
[23,]       NA       NA        5
[24,]       NA       NA        6

You will also want to cut the years into factors.

original.file$Period <- cut(original.file$year, c(2000,2001, 2007, 2011), FALSE,   
                            include.lowest=TRUE) 
## To demonstrate:
    > cbind(2000:2011, cut(2000:2011, c(2000,2001, 2007, 2011), FALSE,   
+     include.lowest=TRUE))
      [,1] [,2]
 [1,] 2000    1
 [2,] 2001    1
 [3,] 2002    2
 [4,] 2003    2
 [5,] 2004    2
 [6,] 2005    2
 [7,] 2006    2
 [8,] 2007    2
 [9,] 2008    3
[10,] 2009    3
[11,] 2010    3
[12,] 2011    3

Then you should be able to do:

Degrees <- apply(original.file, 1, function(X) MyTable[X['Schooling'], X['Period']])

OTHER TIPS

Kudos to Justin for a solution:

if acts on a single boolean value. instead you can use ifelse which acts on vectors but won't be well suited to this. You can also use your boolean conditions and subsetting. Something like dat$degree[dat$year %in% 2000:2001 & dat$schooling <= 8] <- 0. – Justin

The final solution required one adjustment: Because this is not an if-then-else statement and there are several statements, a "<=8" type of structure will not work because subsequent statements will supersede this one. For example, if the next line has "...<= 10] <- 1, then all zeros will be changed to a one once this line is called, and so on. Instead, the "<=8" has to be translated into a %in% c(1:8) statement, and care has to be taken to make all if-like statements be mutually exclusive to avoid this overriding of previous assignments.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top