Question

Apache Giraph vs Neo4j : Are the traversal algorithms across nodes totally different in theses two graph processing systems ? If we were to traverse say a social graph using Giraph and Neo4j on data stored in single machine (not distributed) , which would perform better and Why?

Was it helpful?

Solution

Hands down Neo4j. Giraph's graph computations run as Hadoop jobs, because they are meant to work for large distributed graphs. The overhead of managing these jobs is too large to be efficient on a small scale graph running on a pseudo-distributed single machine cluster.

Not only that, but Neo4j's specialty is traversals. A big reason for that is because Neo4j actually stores adjacent relationships in doubly linked lists in the filesystem. Check out this blog entry :

http://digitalstain.blogspot.nl/2010/10/neo4j-internals-file-storage.html

It explains the way Neo4j optimized the way they store the graph, for fast graph operations such as traversals.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top