Question

Given a certain binary attribute, I want to ensure that the clusters produced by K-means have equal numbers of data points where the said binary attribute's value is 1.

I know the above sentence is wordy so I will explain using an example.

Suppose I have an attribute "Asian" with 40 out of my 100 data points having the value of "Asian" = 1. For k = 10, I want each cluster to have exactly 4 points with "Asian" = 1.

Is there a simple way of achieving this? I have racked my brains but have not been able to come up with one. Please note that I am a beginner when it comes to clustering problems.

Was it helpful?

Solution

Here is a tutorial on how to perform such a k-means modification:

http://elki.dbs.ifi.lmu.de/wiki/Tutorial/SameSizeKMeans

It's not exactly what you need, but a closer k-means variant that can be easily adapted to your needs. Plus, it is a walkthrough tutorial.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top