The new code should be fine. The problem with the old code was caused by a combination of the NA
s in df1$variable
and the ==
comparison operator.
If you read the help on comparison operators, ?"=="
, you will see,
"Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA."
In your case, whenever the df1$variable was NA
, the results of your attempted subset was NA
(not TRUE
or FALSE
), which caused the other variables in the row to be NA
. For example:
df1 <- expand.grid(variable=c(0, 1, NA), var2=c(0, 1, NA))
sel1 <- !(df1$variable==1)
sel1
df1[sel1, ]
sel2 <- df1$variable==0 | is.na(df1$variable)
sel2
df1[sel2, ]