Domanda

I have a factor with values of the form Single (w/children), Married (no children), Single (no children), etc. and would like to split these into two factors, one multi-valued factor for marital status, and binary-valued one for children.

How do I do this in R?

È stato utile?

Soluzione

Some example data

df <- data.frame(status=c("Domestic partners (w/children)", "Married (no
  children)", "Single (no children)"))

Get married status out of string. This assumes that marital status is the first word in the character string. If not, you could do it using grepl

df$married <- sapply(strsplit(as.character(df$status) , " \\(") , "[" , 1)

# Change to factor
df$married <- factor(df$married , levels=c("Single" , "Married", 
                                                 "Domestic partners"))

Get child status out of string

df$ch <- ifelse(grepl("no children" , df$status) , 0 , 1)


A bit more info

This splits each element where there is a " (" - you need to escape the '(' with \\ as it is a special character.

s <- strsplit(as.character(df$status) , " \\(") 

We then subset this by selecting the first term

sapply(s , "[" , 1)

The grepl looks for the string "no children" and return a TRUE or FALSE

grepl("no children" , df$status)

We use an ifelse to dichotomise




EDIT

Re comment: adding in some empty strings ("") to data [Note: rather than having empty strings it is generally better to have these as missing (NA). You can do this when you are reading in the data ie. in read.table you can use the na.strings argument (na.strings=c(NA,"")].

    df <- data.frame(status=c("Domestic partners (w/children)", "Married 
   (no children)", "Single (no children)",""))

The command for married status works but the grepl and ifelse will not. As a quick fix you could add this after the ifelse.

df$ch[df$status==""] <- NA 

or if you manage to set empty strings to missing

df$ch[is.na(df$status)] <- NA 

Run the commands above and this gives

#                          status           married ch
# 1 Domestic partners (w/children) Domestic partners  1
# 2          Married (no children)           Married  0
# 3           Single (no children)            Single  0
# 4                                             <NA> NA
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top