Find all connected nodes in adjacency list

Question 1

As you mentioned a DAG you can use following algorithm to get all connected components to given vertex :-

Do a Topological Sort of the all nodes in graph

Start in decreasing order of the sorted nodes.

Maintain a Set of all connected nodes for each node.

Sort adjacency list of vertex u in toposort order

For vertex u and each edge(u,v) and !Set(u).contains(v) do Set(u) = Set(u) union Set(v) union v

Do this for all node in decreasing order of toposort.

Time Complexity :-

Toposort : O(E)

Caculating Set : O(V^2*logV)

Total: O(V^2*logV)

Example:-

1 -> 3 -> 4  
2 -> 4  
3 -> 2    
4 -> 5  
5 -> NULL  

TopoSort: 1,3,2,4,5

Visiting in descending order :-
1. Set(5) = {null}
2. Set(4) = Set(5) + 5 = {5}
3. Set(2) = Set(4) + 4 = {4,5}
4. Set(3) = Set(2) + 2 = {2,4,5}
5. Set(1) = Set(3) + 1 = {1,2,4,5}

Question 2

The most straightforward way to do it is by recursively descending through the graph. Pseudocode:

function get_connected_nodes(node){
    nodes = set();
    foreach(child in node.children){
        nodes.add(child);
        nodes = nodes.union(get_connected_nodes(child));
    }
    return nodes;
}

Sample Python implementation:

adjacencies = {
    1: [3],
    2: [4],
    3: [2, 4],
    4: [5],
    5: []
}

def get_connected_nodes(node):
    nodes = set()
    for child in adjacencies[node]:
        nodes.add(child)
        nodes = nodes | get_connected_nodes(child)
    return nodes

for i in range(1, 6):
    print i, ":", get_connected_nodes(i)

#output:
#1 : set([2, 3, 4, 5])
#2 : set([4, 5])
#3 : set([2, 4, 5])
#4 : set([5])
#5 : set([])

Edit: For improved performance, storing previously computed results can save you from traversing a node more than once. Pseudocode:

connected_results = dict();
function get_connected_nodes(node){
    if(!connected_results.has_key(node)){
        nodes = set();
        foreach(child in node.children){
            nodes.add(child);
            nodes = nodes.union(get_connected_nodes(child));
        }
        connected_results[node] = nodes;
    }
    return connected_results[node];
}

Sample Python implementation:

def memoize(fn):
    answers = {}
    def memoized_fn(*args):
        if args not in answers:
            answers[args] = fn(*args)
        return answers[args]
    return memoized_fn

adjacencies = {
    1: [3],
    2: [4],
    3: [2, 4],
    4: [5],
    5: []
}

@memoize
def get_connected_nodes(node):
    nodes = set()
    for child in adjacencies[node]:
        nodes.add(child)
        nodes = nodes | get_connected_nodes(child)
    return nodes

for i in range(1, 6):
    print i, ":", get_connected_nodes(i)

Question 3

What you're looking for seems to be the adjacency lists of the transitive closure of your DAG.

Since you're going for efficiency: you've got a DAG, so it doesn't have cycles, which means you can decide reachability for all vertices from a given start vertex s in O(m+n) via DFS/BFS. Do this once for each of the n vertices, and you end up with O(n*(m+n)) which is O(n*m) for graphs where you have Omega(n) edges.

If you've got gigantic dense DAGs, the matrix multiplication approach (take the adjacency matrix, square it, you've got an edge (i,j) whenever adjacencyMatrix[i][j] > 0) with a fast matrix muliplication may be faster; make sure to benchmark if relevant, since despite the asymptotic superiority of the matrix multiplication approach, it does tend to not outperform the above approach in practice for most applications, apparently.

And if you're really dealing with dense graphs (i.e. Omega(n^2) edges) you may want to consider using Ford-Warshall as well, even thought it won't outperform the BFS approach, purely because you'll probably find an implementation of it lying around somewhere (if not, google helps).

Question 4

You are given a directed acyclic graph (DAG) and you intend to find all the nodes reachable from a given node. Consider the following recurrence solution, where f(u) denotes the set of nodes reachable from u.

This simply means that the nodes you can reach from u is the union of the nodes you can reach from all its out-neighbors (adjacent nodes). This can easily be achieved with DFS (depth-first search) along with memoization of previously computed f(u) sets. This approach has a time-complexity of O(|E| + |V| + |E||V|log(|V|)) and an additional space-complexity of O(|V|^2). Note that this algorithm doesn't scale well to very large graphs due to the |V|^2 memory requirement.

The following C++ code demonstrates this approach :

#include <iostream>
#include<algorithm>
#include<cstdio>
#include<cstring>
#include<cmath>
#include<map>
#include<vector>
#include<set>

using namespace std;

#define FOR(i,n) for(int i=0;i<n;++i)
#define FORS(i,s,n) for(int i=s;i<n;++i)
#define MAX 1000000

#define WHITE 0
#define GRAY 1
#define BLACK 2

void dfs_visit(vector<int> adj[], int u, int color[], set<int> cache[])
{
 color[u]=GRAY;
 int len = adj[u].size();

 FOR(i,len)
 {
  int v = adj[u][i];
  if(color[v]==WHITE)
  {
   dfs_visit(adj,v,color,cache);
  }          
 }  

 // Before coloring a node black, set its f(u) to be the union of the f(v) of its neighbours for all v
 FOR(i,len)
 {
    int v = adj[u][i];
    cache[u].insert(v);
    // This will take time logarithmic in the size of f(u) since each insertion requires comparison for uniqueness
    for(set<int>::iterator it = cache[v].begin(); it!=cache[v].end(); ++it)
    {
        cache[u].insert(*it);
    }
 }

    color[u]=BLACK;   
}

void dfs(vector<int> adj[], set<int> cache[], int n)
{
 int color[n];
 memset(color, WHITE, sizeof(color));

 FOR(i,n)
 {
    if(color[i]==WHITE)
    {
     dfs_visit(adj, i, color, cache);
    }                      
 }      
}

int main()
{
    int n, m;
    scanf("%d%d", &n, &m);

    // To handle 0-based index, initialize as n+1
    set<int> cache[n+1];
    vector<int> adj[n+1];

    FOR(i,m)
    {
        int u,v;
        scanf("%d%d", &u, &v);
        adj[u].push_back(v);
    }

    dfs(adj, cache, n);

    FORS(i,1,n)
    {
        set<int> fu = cache[i];
        printf("%d->{", i); 

        for(set<int>::iterator it=fu.begin(); it!=fu.end(); ++it)
        {
            printf("%d,", *it);     
        }

        printf("}\n");
    }

    return 0;
}

Input (N=Number of nodes, M=Number of edges, M lines follow with edge list):

Output:

1->{2,3,4,5,}
2->{4,5,}
3->{2,4,5,}
4->{5,}