سؤال

I have a dataset with the variables sex and navigation. The sex variable has male and female as values. The "navigation" variable has menu and tags as values.

I want to create a new variable with the values male_menu, male_tags, female_menu & female_tags as those are the possible combination of the two existing variables.

How can I create such a new variable in R and include it in the original dataset?

هل كانت مفيدة؟

المحلول

I understood what you wanted somewhat differently than @zach. Here I use the interaction function to create a new factor with the four levels you specified. Here using some dummy data

set.seed(42)

sex <- sample(c("Male","Female"), 20, replace = TRUE)
navigation <- sample(c("menu","tags"), 20, replace = TRUE)

interaction(sex, navigation)

the last line gives

> interaction(sex, navigation)
 [1] Female.tags Female.menu Male.tags   Female.tags Female.menu Female.tags
 [7] Female.menu Male.tags   Female.menu Female.tags Male.tags   Female.tags
[13] Female.menu Male.tags   Male.menu   Female.tags Female.menu Male.menu  
[19] Male.tags   Female.tags
Levels: Female.menu Male.menu Female.tags Male.tags

Is that what you wanted?

نصائح أخرى

Sounds like you are creating dummy variables for a model. Here's an easy way to do this, using model.matrix:

dat <- iris
dat$navigation <- sample(c('menu', 'tags'), nrow(dat), replace=TRUE)

newdat <- data.frame(model.matrix(~0+.+Species*navigation, dat))
> head(newdat)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Speciessetosa
1          5.1         3.5          1.4         0.2             1
2          4.9         3.0          1.4         0.2             1
3          4.7         3.2          1.3         0.2             1
4          4.6         3.1          1.5         0.2             1
5          5.0         3.6          1.4         0.2             1
6          5.4         3.9          1.7         0.4             1
  Speciesversicolor Speciesvirginica navigationtags
1                 0                0              0
2                 0                0              1
3                 0                0              0
4                 0                0              0
5                 0                0              1
6                 0                0              0
  Speciesversicolor:navigationtags Speciesvirginica:navigationtags
1                                0                               0
2                                0                               0
3                                0                               0
4                                0                               0
5                                0                               0
6                                0                               0

If for some reason you don't want to drop the reference levels, you can use the dummyVars function in caret.

Just for another option, you can also use paste.

your_data$sex_navigation <- with(your_data, paste(sex, navigation, sep = "_"))

You can, of course, cast that as a factor by wrapping it in factor(). The major difference between this and the interaction approach is that interaction will create a factor where the levels include all possible interactions, regardless of whether or not they are present. The factor(paste()) approach will only include levels that are present. I find that interaction is usually preferable, but every now and again paste is what I want.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top