OPTICS Clustering algorithm. How to get the best epsilon

https://stackoverflow.com/questions/10884822

12-06-2021
|

Question

I am implementing a project which needs to cluster geographical points. OPTICS algorithm seems to be a very nice solution. It needs just 2 parameters as input(MinPts and Epsilon), which are, respectively, the minimum number of points needed to consider them as a cluster, and the distance value used to compare if two points are in can be placed in same cluster.

My problem is that, due to the extreme variety of the points, I can't set a fixed epsilon. Just look at the image below.

The same points structure but in a different scale would result very different. Suppose to set MinPts=2 and epsilon = 1Km. On the left, the algorithm would create 2 clusters(red and blue), but on the right it would create one single cluster containing all of the points(red), but I would like to obtain 2 clusters even on the right.

So my question is: is there any kind of way to calculate dynamically the epsilon value to get this result?

EDIT 05 June 2012 3.15pm: I thought I was using the OPTICS algorithm implementation from the javaml library, but it seems it is actually a DBSCAN algorithm implementation. So the question now is: does anybody know a java based implementation of OPTICS algorithm?

Thank you very much and excuse my for my poor english.

Marco

Solution

The epsilon value in OPTICS is solely to limit the runtime complexity when using index structures. If you do not have an index for acceleration, you can set it to infinity.

To quote Wikipedia on OPTICS

The parameter \varepsilon is strictly speaking not necessary. It can be set to a maximum value. When a spatial index is available, it does however play a practical role when it comes to complexity.

What you seem to have looks much more like DBSCAN than OPTICS. In OPTICS, you should not need to choose epsilon (it should have been called max-epsilon by the authors!), but your cluster extraction method will take care of that. Are you using the Xi extraction proposed in the OPTICS paper?

minPts is much more important. You should try a value of at least 5 or 10, not 2. With 2, you are essentially performing single-linkage clustering!

The example you gave above should work fine once you increase minPts!

Re: edit: As you can even see in the Wikipedia article, ELKI has a proper OPTICS implementation and it's in Java.

OTHER TIPS

You'd can try to scale epsilon by the total size of the enclosing rectangle. For example, your left data is about 4km x 6km (using my Mark I eyeball to measure) and the right is about 2km x 2km. So, epsilon on the right should be about 2.5 times smaller.

Of course, this doesn't work reliably. If, on your right hand data, there were an additional single point 4km to the right and 2km down, that would make the enclosing rectangle for the right the same as on the left, and you'd get similar (wrong) results.

You can try a minimum spanning tree and then remove the longest edge. The remaining spanning tree and the center of them is the best center for OPTICS and you can count the numbers of points around it.

In your explanation above, it is the change in scale which creates the uncertainty. When your scale gets bigger, your epsilon should change accordingly. Because they are at two very different scales, the two images you've presented are NOT the same set of points. They will not respond identically to your OPTICS algorithm without changing the parameters.

In short, no. there is no way to dynamically calculate epsilon to get this result. Clustering like this is already NP-Hard, and these clustering algorithims (optics, k-means, veroni) can only approximate the optimal solution.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow