R: How to get something like adjacency matrix, but on the intersection value of third column? [closed]

https://stackoverflow.com/questions/18882475

29-06-2022
|

Question

I have data frame like this:

      V1     V2     LABEL
    1 83965 891552   A
    2 88599 891552   B
    3 42966 891552   C
    4 83965 891553   D
    5 88599 891553   D
    6 42966 891553   B

How can I convert it to something like adjacency matrix, but on the intersection of colum-row i would like to have, the third colum value, like that:

        891552 891553
  42966      C      B
  83965      A      D
  88599      B      D

@Henrik

I got such error. I think this segmentfault is caused by big size of data.

Using label as value column: use value.var to override.
Aggregation function missing: defaulting to length

 *** caught segfault ***
address 0x7fff1e099a90, cause 'memory not mapped'

Traceback:
 1: .Call("split_indices", group, as.integer(n))
 2: split_indices(.group, .n)
 3: vaggregate(.value = value, .group = overall, .fun = fun.aggregate,     ..., .default = fill, .n = n)
 4: cast(data, formula, fun.aggregate, ..., subset = subset, fill = fill,     drop = drop, value.var = value.var)
 5: dcast(dat, item ~ worker)
Any idead how it is possible to get rid of it?

I gave up trys with R and I used Python, because all of the solutions: tapply,dcast, reshape, cast were performing extremly poor causing whole system hangs for hours.

BUT: if you know some solutions, which can proceed in a effective way with huge data, let me know

Solution

You may try this, where df is your data frame:

library(reshape2)
dcast(df, V1 ~ V2)

#      V1 891552 891553
# 1 42966      C      B
# 2 83965      A      D
# 3 88599      B      D

OTHER TIPS

Try using the data.table package. You can actually do this kind of reshaping in base R using tapply. This should be fast as it operates on a data.table...

require(data.table)
DT <- data.table(df)
tapply(DT$LABEL , list(DT$V1,DT$V2) , as.character )

#      891552 891553
#42966 "C"    "B"   
#83965 "A"    "D"   
#88599 "B"    "D"

Hopefully that will be fast.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow