Question

I have done K-mean++ clustering and obtained the centroids of the clusters, using Python 2.7, following the method given in http://datasciencelab.wordpress.com/2014/01/15/improved-seeding-for-clustering-with-k-means/

In my problem, there is a further constraint that the distances between any centroid to any node should be larger than a constant. What is the best way to achieve?

It is possible that one centroid is too close to several nodes.

Any suggestions on how to displace the centroids a bit?

Many thanks.

For example, the nodes to be clustered are MyNodes = [[469500, 5802610], [468764, 5803422], [467991, 5804202], [470260, 5799949], [469486, 5800730], [468713, 5801510], [467939, 5802291], [467166, 5803072], [467966, 5800204], [467193, 5800985], [466420, 5801766], [466457, 5799700], [465678, 5800488], [464950, 5799229], [470615, 5796600], [469842, 5797405], [470320, 5794955], [469547, 5795735], [468773, 5796516], [467990, 5797297], [470062, 5793215], [469289, 5793996], [468515, 5794776], [467742, 5795557], [466969, 5796338], [466195, 5797119], [469976, 5791334], [469202, 5792115], [468429, 5792896], [467656, 5793676], [466882, 5794457], [466109, 5795238], [465336, 5796050], [464600, 5796840], [470160, 5789250], [469354, 5789972], [468581, 5790753], [467808, 5791534], [467034, 5792315], [466261, 5793096], [465488, 5793877], [464714, 5794658], [463941, 5795499], [463150, 5796210], [469500, 5787920], [468698, 5788614], [467925, 5789395], [467152, 5790176]]

Centroids = [[ 467839.6, 5793224.1], [ 467617.22222222, 5800489.94444444]] Centroid[0] would be too close to node[29], Centroid1 would be too close to node[8].enter image description here

enter image description here

Was it helpful?

Solution

If I understand your question correctly, and I am not at all sure whether I do, then the solution is already apparent from your drawings:

You want the point that is closest to a given centroid point; and has a minimum distance from a set of node points.

  • Draw a circle around each node point, with your minimum distance as the radius
  • Intersect each circle with each other circle, note the intersection points
  • discard any intersection point that is closer than the minimum distance to a node point.
  • from the remaining intersection points, take the one closest to the centroid point. That is your new displaced centroid.

Runtime for the naive implementation should be O(number_of_nodes^2), though you can optimize that by using some fast nearest-neighbour lookup data structure, such as a kd-tree, when you intersect the circles and discard the intersection points that are too close.

This should be the optimal solution; no matter which algorithm you use, you cannot find a point closer to the original centroid that fits the minimum distance constraint; and this algorithm should always find that optimal point.

Though since the centroid is generally surrounded by node points, the new point may be quite some distance away if the node points are densely packed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top