Question

We know that we can use adjacency list or adjacency matrix for algorithm on graph. It is pretty easy and straightforward for small graphs. But when the graph is very big, for example social network graph, what data structure should be best for implementing traditional algorithms like shortest path finding. Adjacency matrix or list won't work because of high memory requirements, right? What approach do social network engines use?

Was it helpful?

Solution

Adjacency lists are in use in the sources I have found. For very large data sizes you might end up either holding the data on disk or using multiple machines to solve the problem - so I suggest adding keywords such as "external memory" or Hadoop to the search. I tried adding Hadoop and found some papers on solving single source shortest path via parallel breadth first search - http://www.cs.kent.edu/~jin/Cloud12Spring/GraphAlgorithms.pptx, http://courses.cs.washington.edu/courses/cse490h/08au/lectures/algorithms.pdf, Hadoop MapReduce implementation of shortest PATH in a graph, not just the distance

In addition, http://researcher.watson.ibm.com/researcher/files/us-heq/Large%20Scale%20Graph%20Processing%20with%20Apache%20Giraph.pdf does not cover shortest path but is an interesting example of solving connected components using a layer on top of Hadoop that may make life easier.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top