- Find Strongly Connected Components (SCC) using Tarjan's algorithm (O(V+E)), and create the SCC graph.
- Topologically sort the resulting SCC graph (it is a DAG).
- From last to first, find the number of nodes reachable from each component.
- Choose a node which is in a SCC that can reach maximal number of nodes.
Step 3 - elaboration:
(For clarification reasons I will denote a vertex in the original graph as 'node', and a vertex in the SCC graph as 'vertex').
In step 3 you want to find the number of nodes that are reachable from each vertex of your SCC. This can be done by explicitly finding this set, or by finding only the number of nodes:
- Explicitly finding the set of nodes reachable from each vertex:
This is pretty much straight forward, each vertex has an associated set of nodes, and you need to find the set associated to each vertex by doing a union on all edges on your SCC graph leading from the current vertex. - Using inclusion/exclusion to find the number of nodes reachable:
Inclusion/exclusion is a technique used to count size of union of sets where the sets might have repeats in them. For example, if you have 2 sets, the size of their union is|A|+|B|- |A[intersection]B|
.
For 3 sets A,B,C:|A|+|B|+|C|-|A[intersrction]B| - |A[intersection]C| - |B[intersection]C + |A[intersection]B[intersection]C|
(and so on)
Using inclusion/exclusion - the sets are the previous nodes, and the intersections are based on 2 different vertices that will later link themselves to the same vertex.