collapsing two factors into one? [duplicate]

https://stackoverflow.com/questions/13503908

r
factors

01-12-2021
|

Question

Possible Duplicate:
Joining factor levels of two columns in R

I'm fairly new to R, and I'm trying to make my recoding script somewhat more effective and "correct". I've tried searching the forums but that got me nowhere - perhaps I'm using the wrong terminology and missed it, so please bear with me if the question has already been put up.

I have two factor-variables that I wish to collapse into one factor variable. They stem from the same survey and both measure educational level. The reason I have two variables in the first place is because of an unfortunate survey-construction, but thats beside the point. The main point to be made is that they are mutually exclusive (you can only be in one).

My data looks like this:

education       education2
9th grade       <NA>
9th grade       <NA>
<NA>            9th grade
<NA>            10th grade
10th grade      <NA>
11th grade      <NA>
<NA>            9th grade
<NA>            11th grade
<NA>            <NA>

and my script looks like this:

highest.edu     <- vector(length=length(df$education))
a.grade       <- which(df$education=="9th grade")
a.grade2      <- which(df$education2=="9th grade")
b.grade      <- which(df$education=="10th grade")
b.grade2     <- which(df$education2=="10th grade")
c.grade      <- which(df$education=="11th grade")
c.grade2     <- which(df$education=="11th grade")

highest.edu[a.grade]      <- as.character(df$education)[a.grade]
highest.edu[a.grade2]     <- as.character(df$education2)[a.grade2]
highest.edu[b.grade]     <- as.character(df$education)[b.grade]
highest.edu[b.grade2]    <- as.character(df$education2)[b.grade2]
highest.edu[c.grade]     <- as.character(df$education)[c.grade]
highest.edu[c.grade2]    <- as.character(df$education2)[c.grade2]

highest.edu  <- factor(highest.edu)
highest.edu[highest.edu =="FALSE"] =NA
highest.edu  <- factor(highest.edu)

Off course this is not bad but when you have two factor-variables with 15 levels a couple of times or more you start looking for quicker alternatives.

I've tried something like this but without any luck:

a.grade   <- which(df$education=="9th grade" | df$education2=="9th grade")
b.grade  <- which(df$education=="10th grade" | df$education=="10th grade")
c.grade  <- which(df$education=="11th grade" | df$education2=="11th grade")

highest.edu[a.grade]      <- as.character(df$education)  
[a.grade]|as.character(df$education2)[a.grade]
highest.edu[b.grade]      <- as.character(df$education)          
[b.grade]|as.character(df$education2)[b.grade]

giving me this: Error in as.character(df$education)[9th grade] | as.character(df$education2)[9th grade]: operations are possible only for numeric, logical or complex types

Is there a way to overcome this?

Thanks for any suggestions in advance

edit:

the result I'm aiming at is this:

highest.education
9th grade
9th grade
9th grade
10th grade
10th grade
11th grade
9th grade
11th grade
<NA>

the post: 'Joining factor levels of two columns in R' seems to be going for another result

again, thank you

La solution

Once they're character strings it's easy

# make them character types
ed <- levels(df$education)[df$education]
ed2 <- levels(df$education2)[df$education2]
# make one new factor that integrates them
ed[is.na(ed)] <- ed2[is.na(ed)]
# make it a factor again
ed <- factor(ed)

You could accelerate the process by reading them in as characters in the first place, especially if you already set column types in read.table.

Autres conseils

You must ensure that all factor levels are present in the result:

levels(education) <- c(levels(education), levels(education2))
education[is.na(education)] <- education2[is.na(education)]

Basically you need to make sure the levels are both the "union" or "intersection" of the unique levels and in the same order, then you can join them using c. Search on: [r] factor union levels.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow