Question

I'm currently working on a directed graph data structure in C++ (no Boost GL for this project). The primary application will be identifying connected components and sinks. The graphs are expected to be sparse (E ~ 4V upper limit on num edges) and will all be uniform weight. I'm trying to decide between adjacency list, incidence list or possibly some other representation that I haven't heard of yet (adj. matrix not an option bc of sparsity). The bottleneck is likely going to be space overall and speed of graph initialization: Graphs will be initialized from potentially huge arrays such that each element in the array will end up being a vertex with a directed edge to one of its neighboring elements. To get the edges for each vertex, all its neighboring elements must be compared first.

My questions are: (1) Which representation is typically faster to initialize and also fast for BFS traversal, (2) What algorithms (other than vanilla BFS) are there for finding connected components? I know it's O(V+E) using BFS (which is optimal, I think) but I'm worried about the size of the intermediate queue as the graph width grows exponentially with height.

Don't have too much experience with graph implementations, so I'd be grateful for any suggestions.

Was it helpful?

Solution

Consider a layout as follows:

enter image description here

An adjacency list can be implemented as an array of [Nx4] (n being 3 in this case, and 4 because you are saying that 4 is the maximum number of edges in your case) in the following form:

2  3  0  0
3  0  0  0
0  0  0  0

the above representation assumes that the number of vertices are in sorted order where first index into the array is given by (v-1).

Incidence list on the other hand, requires you to define a vertex list, an edge list and connection elements in between (incidence list - graph).

Both are good in terms of space usage compared to an adjacency matrix since your graph is very sparse, as you stated.

My suggestion would be to go with the adjacency list, which you can initialize as an [Nx4] contiguous array in the memory (since you are saying that you will have at most 4 edges for one vertex). This representation will be faster to initialize. (Also, this representation will perform better in terms of cache efficiency.)

However, if you expect the size of your graph changing dynamically and frequently, incidence lists might be better since they are generally implemented as lists which are non contiguous spaces (see the link above). De-allocation and allocation of the adjacency array might be undesirable in that case.

OTHER TIPS

The most efficient way to implement a graph for your purposes probably is a combination of an adjacency list for each vertex and additionally a hashing structure that maps pairs of vertices to edges, if those exist. This'll require O(|V|+ |E|) space for the adjacency list, O(|E|) for the hashing structure and give you on expected O(1) containsEdge(vertex v, vertex w), insertEdge(vertex v, vertex w) and removeEdge(vertex v, vertex w) by using the mapping to get the pointers required to quickly modify the adjacency lists of the vertices.

The Adjacency matrix is good for dense graphs, they prove bad choice for large sparse graphs. Here are time and space complexities of simple operations for sparse graphs.

Sparse Graph Big O Time and Space Complexities

Consider the average number of edges less than 4 for a graph of 1 million nodes. Most modern computers are inadequate to hold the adjacency matrix, given their space complexity of O(n*n) - i.e. RAM requirement of several TB. Whereas, it can fit easily in an inexpensive laptop with basic configuration with RAM requirement of several MB.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top