Question

I have to deal with graph traversals in BFS fashion, upto 3 degrees away. The problem is in the size of the graph, I have almost a million nodes and a few billion edges in total. The times taken by my naive implementation of SQL is far too slow, is there something specific that I could implement? Or is it better if I switch to a graph database like Neo4j? Does anyone have benchmarks of such scale?

Was it helpful?

Solution

The answer to your question really depends upon the 'Connectivity' of your database. You can traverse ~1 million edges a second. So if your 3 degree hop needs to traverse most of your data it will be very slow, however if it needs to traverse only a few nodes it will be quick.

Think of it like sitting in a football stadium. If your 3 hops are 'who' is sitting within 3 seats of you, you don't have many nodes to traverse. However if your query is 'which Liverpool supporters, have a scarf and are dressed in red?', it will take a long time.

Also neo4j at the moment has a 32 bit address space for its nodes and edges, so you might run out of space with your billions of nodes.

Hope this helps

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top