Domanda

I'm experimenting with sklearn.svm.SVC on some text classification tasks. I understand that performing feature selection prior to modelling with SVM is a somewhat questionable endeavour as the performance usually peaks when the full set of features are used. This is still interesting from an academic perspective to see how different feature selection methods rank features differently.

After some digging around, I found that there's a very limited selection of feature selection metrics has been made available in sklearn, i.e. Chi-2. I'm just wondering if other commonly used metrics, such as IG and BNS have been implemented in sklearn (or elsewhere) that I can directly use as the score function in sklearn.feature_selection.SelectKBest()?

Thanks in advance for your kind advise.

È stato utile?

Soluzione

InfoGain is not yet implemented but I think @larsmans wants to get it included at some point in the future. I don't know about BNS.

Please feel free to contribute it if you wish. Here is the contribution guide:

http://scikit-learn.org/dev/developers/index.html

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top