Question

So basically, I use the Python module scipy-cluster to plot a lot of data points. Is there are way/function that give the representative of each cluster if given the threshold, or the number of representatives I want? Ideally, each representative must has the closest distance to the center of the cluster it belongs to.

Edit: I'm looking for the data point closest to the centroid in each cluster.

Was it helpful?

Solution

Scipy-cluster provides coordinates for each centroid and identifies which points are in each cluster. Once you have that, I believe scipy.cluster.vq.py_vq will give you the distance between observations and centroids.

OTHER TIPS

I don't really know my way around scipy-cluster, but it sounds like it gives you the centroid coordinates. Given that information and the knowledge of which points are in the cluster, it should be trivial to calculate the distance from the centroid for each point in the cluster. Just make sure your calculation is based on the same distance metric you used for clustering (probably euclidean distance).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top