Question

I am using R dummy.data.frame function in the dummies package to create dummy variables for the k levels of my factor. Unfortunately, my factor has NAs. When I use dummy.data.frame it creates k dummies with no NAs and a new dummy which flags with 1 the missing values. However, I would like to still have the NAs in the k dummies and not a dummy for the missing values.

Is this possible with that function? Do you know any other functions that can help me?

Was it helpful?

Solution

I usually do this kind of things using the model.matrix(). Using that with the option na.action set to pass will retain the NAs in their correct places. This option does not seem to change the behavior of the function dummy(), so using model.matrix() might be your easiest bet. For example, for a single factor letters the following should do the trick:

options(na.action="na.pass")
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
model.matrix(~letters-1)

Or for several variables or columns of a data frame as well:

letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
betters <- c( "a", "a", "c", "c", "c", "d", "d", "d", NA, "e", "e", "e" )
model.matrix(~letters+betters-1)

The important trick here really is to set the option na.action. After this dummy recoding, it is a good idea to return the option to its default value:

options(na.action="na.omit")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top