DBSCAN algorithm (recursion logic)

https://stackoverflow.com/questions/17261878

01-06-2022
|

Question

DBSCAN(D, eps, MinPts)
   C = 0
   for each unvisited point P in dataset D
      mark P as visited
      NeighborPts = regionQuery(P, eps)
      if sizeof(NeighborPts) < MinPts
         mark P as NOISE
      else
         C = next cluster
         expandCluster(P, NeighborPts, C, eps, MinPts)

expandCluster(P, NeighborPts, C, eps, MinPts)
   add P to cluster C
   for each point P' in NeighborPts 
      if P' is not visited
         mark P' as visited
         NeighborPts' = regionQuery(P', eps)
         if sizeof(NeighborPts') >= MinPts
            NeighborPts = NeighborPts joined with NeighborPts'
      if P' is not yet member of any cluster
         add P' to cluster C

regionQuery(P, eps)
   return all points within P's eps-neighborhood

Above is. as you can see, the algorithm of DBSCAN according to Wikipedia.

I want to ask about this exact part.

      NeighborPts = NeighborPts joined with NeighborPts'

My understanding was that if a core point from a neighbor of core point is visited, it will be joined to the currently examined cluster, right? But how does the recursion happen here? Because we has defined the loop of:

   for each point P' in NeighborPts

before the process of the joining, so any additional point from NeighborPts' won't be examined by the expandCluster function and if the new NeighborPts' actually has a point that is an another core point to the same cluster, how does the algorithm proceed?

I have a code with the implementation of 'expandCluster' method in Java:

public void expand(Vector<Integer> region, Group c, double dist, int minPts){
    for(int i = 0; i < region.size(); i++){
        int idx = region.get(i);
        if(labels[idx] == 0){                         // check if point is visited
            labels[idx] = 1;                          // mark as visited
            Vector<Integer> v = region(idx, dist);    // check for neighboring point
            if (v.size() >= minPts){                  // check if core point
                region.addAll(v);                     // join the NeighborPts 
            }
        }
        if(clustered[idx] == 0){
            c.elements.add(patterns.get(idx));
            clustered[idx] = clusters.size()+1;
        }
    }
}

Will the data collection region going to be revisited after the modification of the data collection through this code region.addAll(v);?

Solution

My understanding was that if a core point from a neighbor of core point is visited, it will be joined to the currently examined cluster, right?

Yes you're right and you can safely remove the line

if P' is not visited

However, this is not efficient.

If a point P' has been visited there is no need to compute its neighborhood and join it with the neighborhood of P.

It is visited means that: it's a noise point, it is already in a cluster or it's a border point. If it's already in a cluster and if it's a core point, this means that its neighbors have been processed already. If it's a border point then its neighbors must not be joined.

But how does the recursion happen here?

In the line

for each point P' in NeighborPts

you must consider NeighborPts as a dynamic container of points. The first time you enter the for loop NeighborPts contains X points. If the joining adds Y points to NeighborPts then the for loop will visit both X and Y sets. This will then repeat for the sets X and Y and this is how the recursion happens.

Will the data collection region going to be revisited after the modification of the data collection through this code region.addAll(v);?

Yes, every time you call region.addAll(v), region.size() increases and this confirms the recursion behavior that confuses you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow