Question

I'm using Weka to develop a classifier for a medical problem. This dataset has a class imbalance situation and I want to know if there is also a problem of class overlapping. Each record has 30 attributes, how can I discover if there is class overlapping using Weka features?

Was it helpful?

Solution

Class Overlapping happens when some samples from different classes have very similar characteristics

  1. Cluster your data set.
  2. If your instances belong to same cluster then they are very similar.
  3. Then find error rate using actual class membership.
  4. If your instances belong to same cluster but their classes are different, then you found what you are asking.

OTHER TIPS

To solve the class imbalance problem, you can use SMOTE. It is in the Weka supervised filter (instance). But can you explain what do you mean by class overlapping?

I think you mean by 'class overlapping', Exist similar instances that belong to different classes. Simply, you can remove them. In awk, you could do the following:

awk '!NF || !seen[$0]++' inputFile > outputFile

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top