Replace unwanted values of factor level with NA

https://stackoverflow.com/questions/10065110

30-05-2021
|

Question

I have a large data frame that contains both blank missing values and NA's. Performing summary(factor(df$col)) gives me something like

(Notice the blank after 50000.)
and sum(is.na(df$col)) is 12476, the same as the number of NA's, but I'd like it to be the sum of the blanks and the NAs.
I tried to create a level for the blanks by doing
levels(df$col) <- c("A", "B", "Blank", "C")
And then trying df$col <- factor(df$col, exclude="Blank") and it says that the NA's were generated but my output is the same. Does anyone know how to create NAs based on a factor level or have a better solution for replacing the missing values? I think the issue might be that the blanks are more than one white space character, so they didn't get turned into NA's but I don't know how to confirm that.

Solution

Try this:

df <- data.frame(a=11:18, col=c("C", "", "A", NA, "A", "", "C", NA))
levels(df$col) # ""  "A" "C"
sum(is.na(df$col)) # 2

df$col <- factor(df$col, levels=LETTERS[1:3])
levels(df$col) # "A" "B" "C"
sum(is.na(df$col)) # 4

Since the new levels do not include blank (""), all blanks will become NA.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow