Pergunta

I have a vector of integer ages that I want to turn into multiple categories:

ages <- round(runif(10, 0, 99))

Now I want this variable to be binned into three categories, depending on age. I want an output object, ages.cat to look like this:

   young mid old
1      0   0   1
2      1   0   0
3      1   0   0
4      1   0   0
5      1   0   0
6      0   1   0
7      1   0   0
8      0   0   1
9      0   1   0
10     0   1   0

At present I am creating this object with the following code:

ages.cat <- array(0, dim=c(10,3)) # create categorical object for 3 bins
ages.cat[ages < 30, 1] <- 1
ages.cat[ages >= 30 & ages < 60, 2] <- 1
ages.cat[ages >= 60, 3] <- 1

ages.cat <- data.frame(ages.cat)
names(ages.cat) <- c("young", "mid", "old")

There must be a faster and more concise way to recode this data - had a play with dplyr but couldn't see a solution to this particular problem with its functions. Any ideas? What's would be the 'canonical' solution to this problem in base R or using a package? Whatever the alternatives, I'm certain they'll be more concise than my clunky code!

Foi útil?

Solução

Its two one-liners.

Use cut to create a factor:

ages <- round(runif(10, 0, 99))
ageF=cut(ages,c(-Inf,30,60,Inf),labels=c("young","mid","old"))
> ageF
 [1] young mid   young young old   mid   old   young old   old  
Levels: young mid old

Usually you'd leave that as a factor and work with it, if you are using R's modelling functions they'll work out the matrix for you. But if you are doing it yourself:

Use model.matrix to create the matrix, with a -1 to remove the intercept and create columns for each level:

> m = model.matrix(~ageF-1)
> m
   ageFyoung ageFmid ageFold
1          1       0       0
2          0       1       0
3          1       0       0
4          1       0       0
5          0       0       1
6          0       1       0
7          0       0       1
8          1       0       0
9          0       0       1
10         0       0       1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$ageF
[1] "contr.treatment"

You can ignore all the contrasty stuff at the end, its just a matrix with some extra attributes for modelling.

Outras dicas

Try this:

library(dplyr)

ages <- 
  data.frame(ages = round(runif(10, 0, 99))) %.%
  mutate(id = 1:n(), 
         cat = factor(ifelse(ages < 30, "young",
                             ifelse(ages >= 30 & ages < 60, 
                                    "mid", "old")))) %.%
  dcast(id ~ cat, value.var = 'ages', length)
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top