Question

Good day!

I have been looking all over the Internet on how to compute for silhouette coefficient, cohesion and separation unfortunately, despite the resources, I just can't understand the formulas posted. I know that there are implementations of it in some tool, but I want to know how to manually compute them especially given a vector space model.

Assuming that I have the following clusters:

Cluster 1 ={{1,0},{1,1}}
Cluster 2 ={{1,2},{2,3},{2,2},{1,2}},
Cluster 3 ={{3,1},{3,3},{2,1}}

The way I understood it according to [1] is that I have to get the average of the points per cluster:

C1 X = 1; Y = .5
C2 X = 1.5; Y = 2.25
C3 X = 2.67; Y = 1.67

Given the mean, I have to compute for my cohesion by Sum of Square Error (SSE):

Cohesion(C1) = (1-1)^2 + (1-1)^2 + (0-.5)^2 + (0-.5)^2 = 0.5
Cohesion(C2) = (1-1.5)^2 + (2-1.5)^2 + (2-1.5)^2 + (1-1.5)^2 + (2-2.5)^2 + (3-2.5)^2 + (2-2.5)^2 +(2-2.5)^2 = 2
Cohesion(C3) = (3-2.67)^2 + (3-2.67)^2 + (2-2.67)^2 + (1-1.67)^2 + (3-1.67)^2 + (1-1.67)^2 = 3.3334

Cluster(C) = 0.5 + 2 + 3.3334 = 5.8334

My questions are:
1. Did I perform cohesion correctly?
2. How do I compute for Separation?
3. How do I compute for Silhouette Coefficient?

Thank you.


References:
[1] http://www.cs.kent.edu/~jin/DM08/ClusterValidation.pdf

Was it helpful?

Solution 2

Computation of Silhouette is straightforward, but it does not involve the centroids.

So don't try to compute it from what you did for cohesion; compute it from your original data.

OTHER TIPS

Cluster 1 ={{1,0},{1,1}} 
Cluster 2 ={{1,2},{2,3},{2,2},{1,2}}, 
Cluster 3 ={{3,1},{3,3},{2,1}}

Take a point {1,0} in cluster 1

Calculate its average distance to all other points in it’s cluster, i.e. cluster 1

So a1 =√( (1-1)^2 + (0-1)^2) =√(0+1)=√1=1

Now for the object {1,0} in cluster 1 calculate its average distance from all the objects in cluster 2 and cluster 3. Of these take the minimum average distance.

So for cluster 2

{1,0} ----> {1,2} = distance = √((1-1)^2 + (0-2)^2) =√(0+4)=√4=2
{1,0} ----> {2,3} = distance = √((1-2)^2 + (0-3)^2) =√(1+9)=√10=3.16
{1,0} ----> {2,2} = distance = √((1-2)^2 + (0-2)^2) =√(1+4)=√5=2.24
{1,0} ----> {1,2} = distance = √((1-1)^2 + (0-2)^2) =√(0+4)=√4=2

Therefore, the average distance of point {1,0} in cluster 1 to all the points in cluster 2 =

(2+3.16+2.24+2)/4 = 2.325

Similarly, for cluster 3

{1,0} ----> {3,1} = distance = √((1-3)^2 + (0-1)^2) =√(4+1)=√5=2.24
{1,0} ----> {3,3} = distance = √((1-3)^2 + (0-3)^2) =√(4+9)=√13=3.61
{1,0} ----> {2,1} = distance = √((1-2)^2 + (0-1)^2) =√(1+1)=√2=2.24

Therefore, the average distance of point {1,0} in cluster 1 to all the points in cluster 3 =

(2.24+3.61+2.24)/3 = 2.7

Now, the minimum average distance of the point {1,0} in cluster 1 to the other clusters 2 and 3 is,

b1 =2.325 (2.325 < 2.7)

So the silhouette coefficient of cluster 1

s1= 1-(a1/b1) = 1- (1/2.325)=1-0.4301=0.5699

In a similar fashion you need to calculate the silhouette coefficient for cluster 2 and cluster 3 separately by taking any single object point in each of the clusters and repeating the steps above. Of these the cluster with the greatest silhouette coefficient is the best as per evaluation.

Note: The distance here is the Euclidean Distance! You can also have a look at this video for further explanation:

https://www.coursera.org/learn/cluster-analysis/lecture/RJJfM/6-2-clustering-evaluation-measuring-clustering-quality

As you have calculated the Cohesion of C1, there is a mistake.

Cohesion(C1) = (1 - 1) ^ 2 + (1 - 1) ^ 2 + (0 - .5) ^ 2 + (1 - .5) ^ 2 = 0.5 

This is the Prototype-Based (Centroid in this case) Cohesion calculation.

For calculating Separation: {Between clusters i.e. (C1,C2) , (C1,C3) & (C2,C3)}

Separation(C1,C2) = SSE(Centroid(C1), Centroid(C2))
= (1 - 1.5) ^ 2 + (0.5 - 2.25) ^ 2 = 1 + 3.0625 = 4.0625

Silhouette Coefficient: Combines both the Cohesion and Separation.

Refer https://cs.fit.edu/~pkc/classes/ml-internet/silhouette.pdf

thanks for your answer,
Calculate its average distance to all other points in its cluster, i.e. cluster 1' --> This part has to be corrected.

So

a1 =√( (1-1)^2 + (0-1)^2) =√(0+1)=√1=1

{1,0} ----> {2,1} = distance = √((1-2)^2 + (0-1)^2) =√(1+1)=√2=2.24

this is an error because the root of 2 is approximately 1.41

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top