Some example data
df <- data.frame(status=c("Domestic partners (w/children)", "Married (no
children)", "Single (no children)"))
Get married status out of string. This assumes that marital status is the first word in the character string. If not, you could do it using grepl
df$married <- sapply(strsplit(as.character(df$status) , " \\(") , "[" , 1)
# Change to factor
df$married <- factor(df$married , levels=c("Single" , "Married",
"Domestic partners"))
Get child status out of string
df$ch <- ifelse(grepl("no children" , df$status) , 0 , 1)
A bit more info
This splits each element where there is a " (" - you need to escape the '(' with \\ as it is a special character.
s <- strsplit(as.character(df$status) , " \\(")
We then subset this by selecting the first term
sapply(s , "[" , 1)
The grepl
looks for the string "no children" and return a TRUE or FALSE
grepl("no children" , df$status)
We use an ifelse to dichotomise
EDIT
Re comment: adding in some empty strings ("") to data [Note: rather than having empty strings it is generally better to have these as missing (NA). You can do this when you are reading in the data ie. in read.table
you can use the na.strings
argument (na.strings=c(NA,"")].
df <- data.frame(status=c("Domestic partners (w/children)", "Married
(no children)", "Single (no children)",""))
The command for married status works but the grepl
and ifelse will not. As a quick fix you could add this after the ifelse.
df$ch[df$status==""] <- NA
or if you manage to set empty strings to missing
df$ch[is.na(df$status)] <- NA
Run the commands above and this gives
# status married ch
# 1 Domestic partners (w/children) Domestic partners 1
# 2 Married (no children) Married 0
# 3 Single (no children) Single 0
# 4 <NA> NA