How to select closest representative to the center in each cluster in scipy-cluster?
-
15-06-2021 - |
Question
So basically, I use the Python module scipy-cluster to plot a lot of data points. Is there are way/function that give the representative of each cluster if given the threshold, or the number of representatives I want? Ideally, each representative must has the closest distance to the center of the cluster it belongs to.
Edit: I'm looking for the data point closest to the centroid in each cluster.
Solution
Scipy-cluster provides coordinates for each centroid and identifies which points are in each cluster. Once you have that, I believe scipy.cluster.vq.py_vq
will give you the distance between observations and centroids.
OTHER TIPS
I don't really know my way around scipy-cluster, but it sounds like it gives you the centroid coordinates. Given that information and the knowledge of which points are in the cluster, it should be trivial to calculate the distance from the centroid for each point in the cluster. Just make sure your calculation is based on the same distance metric you used for clustering (probably euclidean distance).