Question

I am trying to manipulate a data frame. As an example: say I have a dataframe containing customers and the shops they visit:

df = data.frame(customers = c("a", "b", "b", "c", "c"),
                shop_visited = c("X", "X", "Y", "X", "Z"))
customers shop_visited
        a            X
        b            X
        b            Y
        c            X
        c            Z

Summarizing this dataframe:

  • one customer (b) shops at X and also at Y;
  • one customer (b) shops at Y and also at X;
  • one customer (c) shops at X and also at Z;
  • one customer (c) shops at Z and also at X

Or, more succinctly:

relations = data.frame(source = c("X","Y", "X", "Z"), 
                       target = c("Y","X","Z","X"))
 source target
      X      Y
      Y      X
      X      Z
      Z      X

I am looking for a method that will be able to do the transformation df -> relations. The motivation behind this is that I can then use relations as the edges argument in write.gexf. Cheers for any help.

Was it helpful?

Solution

df <- data.frame(customers = c("a", "b", "b", "c", "c"),
                 shop_visited = c("X", "X", "Y", "X", "Z"))

#create an identifier df
dfnames <- data.frame(i = as.numeric(df$shop_visited), 
                      shop_visited = df$shop_visited)

library(tnet)
tdf       <- as.tnet( cbind(df[,2],df[,1]),type =  "binary two-mode tnet" )
relations <- projecting_tm(tdf, method = "sum")

# match original names
relations[["i"]] <- dfnames[ match(relations[['i']], dfnames[['i']] ) , 'shop_visited']
relations[["j"]] <- dfnames[ match(relations[['j']], dfnames[['i']] ) , 'shop_visited']

# clean up names
names(relations) <- c("source" , "target", "weight")


#> relations
#  source target weight
#1      X      Y      1
#2      X      Z      1
#3      Y      X      1
#4      Z      X      1

OTHER TIPS

Please take a look to the function edge.list of rgexf(http://www.inside-r.org/packages/cran/rgexf/docs/edge.list). Using your example it would be something like this

library(rgexf)

# Your data
df = data.frame(customers = c("a", "b", "b", "c", "c"),
                shop_visited = c("X", "X", "Y", "X", "Z"))

# Getting nodes and edges
df2 <- edge.list(df)

Looks like this

> df2
$nodes
  id label
1  1     1
2  2     2
3  3     3

$edges
     [,1] [,2]
[1,]    1    1
[2,]    2    1
[3,]    2    2
[4,]    3    1
[5,]    3    3

Finally, you can use this to write a GEXF graph

# Building the graph
write.gexf(nodes=df2$nodes, edges=df2$edges)

<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.2draft" xmlns:viz="http://www.gexf.net/1.1draft/viz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.gexf.net/1.2draft http://www.gexf.net/1.2draft/gexf.xsd" version="1.2">
  <meta lastmodifieddate="2013-08-06">
    <creator>NodosChile</creator>
    <description>A graph file writing in R using "rgexf"</description>
    <keywords>gexf graph, NodosChile, R, rgexf</keywords>
  </meta>
  <graph mode="static">
    <nodes>
      <node id="1" label="1"/>
      <node id="2" label="2"/>
      <node id="3" label="3"/>
    </nodes>
    <edges>
      <edge id="0" source="1" target="1" weight="1.0"/>
      <edge id="1" source="2" target="1" weight="1.0"/>
      <edge id="2" source="2" target="2" weight="1.0"/>
      <edge id="3" source="3" target="1" weight="1.0"/>
      <edge id="4" source="3" target="3" weight="1.0"/>
    </edges>
  </graph>
</gexf>

Please let me know if you have any doubt george dot vega at nodoschile.org

Best!

George (creator of rgexf)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top