質問

Suppose I have a column in a matrix or data.frame as follows:

df <- data.frame(col1=sample(letters[1:3], 10, TRUE))

I want to expand this out to multiple columns, one for each level in the column, with 0/1 entries indicating presence or absence of level for each row

newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
for (i in 1:length(levels(df$col1))) {
  curLetter <- levels(df$col1)[i]
  newdf[which(df$col1 == curLetter), curLetter] <- 1
}
newdf

I know there's a simple clever solution to this, but I can't figure out what it is. I've tried expand.grid on df, which returns itself as is. Similarly melt in the reshape2 package on df returned df as is. I've also tried reshape but it complains about incorrect dimensions or undefined columns.

役に立ちましたか?

解決 2

It's very easy with model.matrix

model.matrix(~ df$col1 + 0)

The term + 0 means that the intercept is not included. Hence, you receive a dummy variable for each factor level.

The result:

   df$col1a df$col1b df$col1c
1         0        0        1
2         0        1        0
3         0        0        1
4         1        0        0
5         0        1        0
6         1        0        0
7         1        0        0
8         0        1        0
9         1        0        0
10        0        1        0
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`df$col1`
[1] "contr.treatment"

他のヒント

Obviously, model.matrix is the most direct candidate here, but here, I'll present three alternatives: table, lapply, and dcast (the last one since this question is tagged .

table

table(sequence(nrow(df)), df$col1)
#     
#      a b c
#   1  1 0 0
#   2  0 1 0
#   3  0 1 0
#   4  0 0 1
#   5  1 0 0
#   6  0 0 1
#   7  0 0 1
#   8  0 1 0
#   9  0 1 0
#   10 1 0 0

lapply

newdf <- data.frame(a=rep(0, 10), b=rep(0,10), c=rep(0,10))
newdf[] <- lapply(names(newdf), function(x) 
    { newdf[[x]][df[,1] == x] <- 1; newdf[[x]] })
newdf
#    a b c
# 1  1 0 0
# 2  0 1 0
# 3  0 1 0
# 4  0 0 1
# 5  1 0 0
# 6  0 0 1
# 7  0 0 1
# 8  0 1 0
# 9  0 1 0
# 10 1 0 0

dcast

library(reshape2)
dcast(df, sequence(nrow(df)) ~ df$col1, fun.aggregate=length, value.var = "col1")
#    sequence(nrow(df)) a b c
# 1                   1 1 0 0
# 2                   2 0 1 0
# 3                   3 0 1 0
# 4                   4 0 0 1
# 5                   5 1 0 0
# 6                   6 0 0 1
# 7                   7 0 0 1
# 8                   8 0 1 0
# 9                   9 0 1 0
# 10                 10 1 0 0
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top