You may try this, where df
is your data frame:
library(reshape2)
dcast(df, V1 ~ V2)
# V1 891552 891553
# 1 42966 C B
# 2 83965 A D
# 3 88599 B D
Question
I have data frame like this:
V1 V2 LABEL
1 83965 891552 A
2 88599 891552 B
3 42966 891552 C
4 83965 891553 D
5 88599 891553 D
6 42966 891553 B
How can I convert it to something like adjacency matrix, but on the intersection of colum-row i would like to have, the third colum value, like that:
891552 891553
42966 C B
83965 A D
88599 B D
@Henrik
I got such error. I think this segmentfault is caused by big size of data.
Using label as value column: use value.var to override.
Aggregation function missing: defaulting to length
*** caught segfault ***
address 0x7fff1e099a90, cause 'memory not mapped'
Traceback:
1: .Call("split_indices", group, as.integer(n))
2: split_indices(.group, .n)
3: vaggregate(.value = value, .group = overall, .fun = fun.aggregate, ..., .default = fill, .n = n)
4: cast(data, formula, fun.aggregate, ..., subset = subset, fill = fill, drop = drop, value.var = value.var)
5: dcast(dat, item ~ worker)
Any idead how it is possible to get rid of it?
Solution
You may try this, where df
is your data frame:
library(reshape2)
dcast(df, V1 ~ V2)
# V1 891552 891553
# 1 42966 C B
# 2 83965 A D
# 3 88599 B D
OTHER TIPS
Try using the data.table
package. You can actually do this kind of reshaping in base
R using tapply
. This should be fast as it operates on a data.table
...
require(data.table)
DT <- data.table(df)
tapply(DT$LABEL , list(DT$V1,DT$V2) , as.character )
# 891552 891553
#42966 "C" "B"
#83965 "A" "D"
#88599 "B" "D"
Hopefully that will be fast.