Вопрос

I built a classifier with 13 features ( no binary ones ) and normalized individually for each sample using scikit tool ( Normalizer().transform).

When I make predictions it predicts all training sets as positives and all test sets as negatives ( irrespective of fact whether it is positive or negative )

What anomalies I should focus on in my classifier, feature or data ???

Notes: 1) I normalize test and training sets (individually for each sample) separately.

2) I tried cross validation but the performance is same

3) I used both SVM linear and RBF Kernels

4) I tried without normalizing too. But same poor results

5) I have same number of positive and negative datasets ( 400 each) and 34 samples of positive and 1000+ samples of negative test sets.

Это было полезно?

Решение

If you're training on balanced data the fact that "it predicts all training sets as positive" is probably enough to conclude that something has gone wrong.

Try building something very simple (e.g. a linear SVM with one or two features) and look at the model as well as a visualization of your training data; follow the scikit-learn example: http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html

There's also a possibility that your input data has many large outliers impacting the transform process...

Другие советы

Try doing feature selection on the training data (Seperately from your test/validation data). Feature selection on your whole dataset can easily lead to overfitting.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top