Question

Can anyone supply some sample code or hints on how to import a 1MB CSV of nodes, and another 1MB CSV of edges, into Titan graph database running on Cassandra?

I've got small CSV files importing via Gremlin, but this doesn't seem appropriate for large files.

I've seen Faunus can do this, but I'd like to avoid spending a couple of days setting it up if possible.

It looks like BatchGraph might be the way to go (https://github.com/tinkerpop/blueprints/wiki/Batch-Implementation) but the example appears to be incomplete.

Was it helpful?

Solution

My question was answered at https://groups.google.com/forum/#!topic/aureliusgraphs/ew9PJVxa8Xw :

1) The gremlin script is fine for a 1mb import (Stephen Mallette)

2) BatchGraph code (Daniel Kuppitz)

Prerequisties:

echo "alice,32"         > /tmp/vertices.csv
echo "bob,33"          >> /tmp/vertices.csv
echo "alice,knows,bob"  > /tmp/edges.csv

In Gremlin REPL:

config = new BaseConfiguration()
config.setProperty("storage.backend", "inmemory")

g = TitanFactory.open(config)
bg = new BatchGraph(g, VertexIDType.STRING, 1000)

new File("/tmp/vertices.csv").each({ line ->
  (username, age) = line.split(",")
  user = bg.addVertex("user::" + username)
  ElementHelper.setProperties(user, ["username":username,"age":age.toInteger()])
})

new File("/tmp/edges.csv").each({ line ->
  (source, label, target) = line.split(",")

  v1 = bg.getVertex("user::" + source)
  v2 = bg.getVertex("user::" + target)

  bg.addEdge(null, v1, v2, label)
})

bg.commit()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top