Question

Suppose we have four observations and the return value of scipy.cluster.hierarchy.linkage is:

[[ 1.          3.          0.08        2.        ]
 [ 2.          4.          0.28813559  3.        ]
 [ 0.          5.          1.          4.        ]]

This return value means: first observations 1 and 3 are merged to new cluster 4, then observation 2 is added into this new cluster to form a still new cluster 5. Finally the observation 0 is clustered. Since I want to get two clusters {1,3,2} and {0}, I expect a return value of [2,1,1,1] which means that element 0 belongs to cluster 2 and the rest are grouped into another cluster 1, using threshold 0.4. But actually scipy.cluster.hierarchy.fcluster returns [ 3 1, 2 ,1 ]. Of course I can write python code to analyse linkage's returning 2-D array by myself, but I think the fcluster function can return what I want if I set the threshold to be 0.4. However, I don't know how to provide parameters to it, so I wonder if you could provide with some example codes to conduct hierarchical clustering using linkage and give the final result using fcluster with observations grouped in a cluster represented by a set. Thank you.

Was it helpful?

Solution

fcluster has inconsistent as standard argument for the criterion to choose. Use distance as argument, to take the cophenetic distance from the linkage matrix Z[:,2]. You might just use maxclust as criterion if you want to specify the number of clusters. If you're clustering with single linkage, likely some clusters are singletons (outliers). Help(fcluster) gives the needed info on how to use the function, so do the docs

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top