A fast way to find connected component in a 1-NN graph?

Question 1

The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.

This will definitely run in linear time:

- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
    and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).

Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:

for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop, 
   collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets

The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.

You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.

Question 2

Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.

In your example:

0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]

You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.

Question 3

If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link . Here is the pseudocode .honoring @pycho 's words

`

 public class QuickUnion
    {
    private int[] id;

    public QuickUnion(int N)
    {
         id = new int[N];
         for (int i = 0; i < N; i++) id[i] = i;
    }

    public int root(int i)
    {
         while (i != id[i])
         {
             id[i] = id[id[i]];
             i = id[i];
         }
         return i;
    }

    public boolean find(int p, int q)
    {
        return root(p) == root(q);
    }

    public void unite(int p, int q)
    {
        int i = root(p);
        int j = root(q);
        id[i] = j;
    }

    }

` @reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link

https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/