Pergunta

Apache Giraph vs Neo4j : Are the traversal algorithms across nodes totally different in theses two graph processing systems ? If we were to traverse say a social graph using Giraph and Neo4j on data stored in single machine (not distributed) , which would perform better and Why?

Foi útil?

Solução

Hands down Neo4j. Giraph's graph computations run as Hadoop jobs, because they are meant to work for large distributed graphs. The overhead of managing these jobs is too large to be efficient on a small scale graph running on a pseudo-distributed single machine cluster.

Not only that, but Neo4j's specialty is traversals. A big reason for that is because Neo4j actually stores adjacent relationships in doubly linked lists in the filesystem. Check out this blog entry :

http://digitalstain.blogspot.nl/2010/10/neo4j-internals-file-storage.html

It explains the way Neo4j optimized the way they store the graph, for fast graph operations such as traversals.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top