Set variable values to missing in R and drop unused levels

https://stackoverflow.com/questions/16225658

r
levels

13-04-2022
|

Domanda

I have a data set, DATA, with a variable, VAR. This variables mode is numeric, and its class is a factor. It represents gender. When printed out, it looks something like below

 VAR
  M
  M
  F
  U

  M

When I print out levels, it outputs: "" "F" "M" "U", and a frequency table looks like this:

     F     M     U
 2   30    25    1

What I want to do is change everything that is not "F" or "M" to be a missing values, then label them "Man" and "Woman", and drop unused levels for the variable (but still leave a level for missing). So far I have the code below:

DATA$VAR[DATA$VAR == "U" | DATA$VAR == ""] <- NA

But I got the exact same values for the levels, and now the frequency table looks like this:

     F     M     U
 0   30    25    0

I feel like I'm close, but not quite there. I don't understand how to deal with the level issues. Any help is greatly appreciated.

Soluzione

To create a factor where everything bar what was M and F become missing use levels within a call to factor. To relabel these use the labels argument

a <-  factor(c("M","M","F","U","","M"))

a2 <- factor(a, levels = c('M','F'), labels =c('Male','Female'))

a2
# [1] Male   Male   Female <NA>   <NA>   Male  
# Levels: Male Female

If you want to tally NA values in table, set useNA = 'always' or useNA='ifany'

table(a2, useNA = 'ifany')
##   a2
##   Male Female   <NA> 
##     3      1      2

Altri suggerimenti

You also have a droplevels() function in R!

a = factor(c("M","M","F","U","M"))

a.sub <- subset(a, a != "U")

droplevels(a.sub)

I think you can just overwrite the factor levels.

a = factor(c("M","M","F","U","","M"))
table(a)
# a
#   F M U 
# 1 1 3 1 
levels(a)[!levels(a)%in%c("M","F")] <- NA
table(a)
# a
# F M 
# 1 3

EDIT: Similarly, relabeling the levels:

levels(a)
# "F" "M"
levels(a) <- c("Female","Male")

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow