Question

I have seen many DBSCAN algorithm implemented using a formula to estimate the neighborhood radius (Eps) based on the given minimum points within a cluster (k).

[full code] http://toolz.googlecode.com/svn/trunk/CWT/dbscan.py

% Analytical calculation of rad if not given

function [Eps] = epsilon(x,k) 

[m,n] = size(x);

Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);

I have searched extensively to understand how this analytical formula was derived but been unsuccessful.

Was it helpful?

Solution 2

The estimation of a suboptimal radius is described in the OPTICS paper

Looking for Natural Patterns in Analytical Data. 2. Tracing Local Density with OPTICS

As outline in the paper there are assumptions to make this formulation useful.

To sum up, citing the article, the density of the objects of a dataset can be compared with the density of the same number of objects uniformly distributed in the same volume as the data set. If the data set has a uniform distribution, then the neighborhood radius eps , containing k points can be estimated.

OTHER TIPS

Does it come with a scientific reference, or is this just something someone made up himself?

The formula looks like the volume formula of n-balls.

So it may be based on the idea that if the data were uniformly distributed on a cube, and all edges had the same length, this L2-sphere would be expected to have this number of points, not taking boundary effects into account.

However, if your data would look like this, you wouldn't need to run clustering. These assumptions are much too strong to make sense in practical applications.

I don't think it is advisable to use this formula!

In particular, if you cannot find a proof or explanation in literature.

I also would suggest to not use this code either. His "OPTICS" implementation was anything, but the OPTICS algorithm... there are better, proper implementations out there. For best results, you will also want to have index support.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top