Suppose I have a column in a matrix or data.frame as follows:

df <- data.frame(col1=sample(letters[1:3], 10, TRUE))

I want to expand this out to multiple columns, one for each level in the column, with 0/1 entries indicating presence or absence of level for each row

newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
for (i in 1:length(levels(df$col1))) {
  curLetter <- levels(df$col1)[i]
  newdf[which(df$col1 == curLetter), curLetter] <- 1
}
newdf

I know there's a simple clever solution to this, but I can't figure out what it is. I've tried expand.grid on df, which returns itself as is. Similarly melt in the reshape2 package on df returned df as is. I've also tried reshape but it complains about incorrect dimensions or undefined columns.

有帮助吗?

解决方案 2

It's very easy with model.matrix

model.matrix(~ df$col1 + 0)

The term + 0 means that the intercept is not included. Hence, you receive a dummy variable for each factor level.

The result:

   df$col1a df$col1b df$col1c
1         0        0        1
2         0        1        0
3         0        0        1
4         1        0        0
5         0        1        0
6         1        0        0
7         1        0        0
8         0        1        0
9         1        0        0
10        0        1        0
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`df$col1`
[1] "contr.treatment"

其他提示

Obviously, model.matrix is the most direct candidate here, but here, I'll present three alternatives: table, lapply, and dcast (the last one since this question is tagged .

table

table(sequence(nrow(df)), df$col1)
#     
#      a b c
#   1  1 0 0
#   2  0 1 0
#   3  0 1 0
#   4  0 0 1
#   5  1 0 0
#   6  0 0 1
#   7  0 0 1
#   8  0 1 0
#   9  0 1 0
#   10 1 0 0

lapply

newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
newdf[] <- lapply(names(newdf), function(x) 
    { newdf[[x]][df[,1] == x] <- 1; newdf[[x]] })
newdf
#    a b c
# 1  1 0 0
# 2  0 1 0
# 3  0 1 0
# 4  0 0 1
# 5  1 0 0
# 6  0 0 1
# 7  0 0 1
# 8  0 1 0
# 9  0 1 0
# 10 1 0 0

dcast

library(reshape2)
dcast(df, sequence(nrow(df)) ~ df$col1, fun.aggregate=length, value.var = "col1")
#    sequence(nrow(df)) a b c
# 1                   1 1 0 0
# 2                   2 0 1 0
# 3                   3 0 1 0
# 4                   4 0 0 1
# 5                   5 1 0 0
# 6                   6 0 0 1
# 7                   7 0 0 1
# 8                   8 0 1 0
# 9                   9 0 1 0
# 10                 10 1 0 0
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top