Question

Allow me to preface this by saying that I am new to R. I cleaned some income and rent variables and now I am trying to recode my race variable from 9 categories to 2. The original variable is coded as follows:

1=White 2=Black 3=Native 4=Asian 5=A 6=B 7=C 8=D 9=E. I'm basically trying to eliminate all other races and only keep White and Black as a dummy variable, where White=0 and Black=1. Here's the code:

library(foreign)
library(ggplot2)
df<-read.dta("acs2010.dta")
View(df)
attach(df)
summary(df)

inctot[inctot==9999999]<-NA
inctot[inctot<=0]<-NA
summary(inctot)
incomesq<-(inctot)^2

rent[rent==0]<-NA
summary(rent)

levels(race)[1]<-"White"
levels(race)[2]<-"Black"
levels(race)[3:9]<-NA
levels(race)

ggplot(data=df,aes(x=race))+geom_bar()
view(df)

Manipulating the levels leaves me with "White" and "Black" but when I plot it, it shows the NA's as well. I'm not sure how to get rid of NA's in factor variables. Any ideas would be appreciated.

Was it helpful?

Solution

The approach in the question to recoding the race factor looks fine.

It seems that the real problem here was omitting the NAs from the plot. Just subset the data frame:

ggplot(data =df[!is.na(df$race),], aes(x=race)) + geom_bar()

Further reading:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top