Frage

The default settings for clustering seems to work fine- specifically the EuclideanDistanceFunction. However, I want to run the clustering with spatial data in the form of lng/lat and when I change the distance function elki crashes on me:

Running: -dbc.in /tmp/test_data_lnglat-test.dat -db.index tree.spatial.rstarvariants.deliclu.DeLiCluTreeFactory -algorithm clustering.DeLiClu -algorithm.distancefunction geo.LngLatDistanceFunction -deliclu.minpts 4
Task failed
java.lang.UnsupportedOperationException: MBR to MBR mindist is not yet implemented.
at de.lmu.ifi.dbs.elki.distance.distancefunction.geo.LngLatDistanceFunction.doubleMinDist(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.processDataPagesDouble(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.processDataPagesOptimize(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.initHeaps(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.run(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.clustering.DeLiClu.run(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(Unknown Source)
at de.lmu.ifi.dbs.elki.workflow.AlgorithmStep.runAlgorithms(Unknown Source)
at […]

Its not clear (to me) what this error means. Is it possible the clustering functions do not work with geo spatial data?

Is there a simple workaround for this? Would it be difficult to implement the required function (mindset)?

War es hilfreich?

Lösung

As the error pretty clearly states:

MBR to MBR mindist is not yet implemented.

However, the algorithm you tried to use - DeLiClu - needs to compute the minimum distance between two rectangles. In geodetic coordinates, not in the 2d plane.

You are welcome to contribute the adequate formulas. Spherical geometry is not trivial, so be aware that computing the minimal rectangle-to-rectangle distance is non trivial. It is not sufficient to look at the four corners. So far, we have only solved this for the point-to-rectangle case. It's doable - as the rectangles are axis aligned - but nobody so far bothered to sit down and do the math, and then sit down some more and optimize the formulas to require as little trigonometric functions as possible.

The simplest workaround probably is to use OPTICS with a regular R-tree (use bulk loads with STR!) instead of DeLiClu, because this algorithm will yield an almost identical result, but does not need the rectangle-to-rectangle minimum distance. In theory, DeLiClu is faster; in practise this does not necessarily hold, because of the much more complex (and thus harder to optimize) code of KNN joins on R-Trees.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top