How to deal with frequent classes?

https://stackoverflow.com/questions/17722227

03-06-2022
|

質問

I'm working on a classification task in Weka and got the problem that my class to predict has one value that is very frequent (about 85%). This leads to a lot of learning algorithms just predicting this frequent value of this class for a new dataset.

How can I deal with this problem? Does it just mean that I didn't find features that work well enough in predicting something better? Or is there something specific I can do to solve this problem?

I guess this is a pretty common problem, but I was not able to find a solution to it here.

解決

You need to "SMOTE" your data. First figure out how many more instances of the minority case you need. In my case I wanted to get around a 50/50 ratio so I needed to over sample by 1300 percent. This tutorial will help if you are using the GUI: http://www.youtube.com/watch?v=w14ha2Fmg6U If you are doing this from the command line using Weka, the following command will get you going:

#Weka 3.7.7
java weka.Run -no-scan weka.filters.supervised.instance.SMOTE \
-c last -K 25 -P 1300.0 -S 1 -i input.arff  -o  output.arff

The -K option is the number of neighbors to take into account when smoting the data. The default is 5, but 25 worked best for my dataset.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow