Question

I built a classifier with 13 features ( no binary ones ) and normalized individually for each sample using scikit tool ( Normalizer().transform).

When I make predictions it predicts all training sets as positives and all test sets as negatives ( irrespective of fact whether it is positive or negative )

What anomalies I should focus on in my classifier, feature or data ???

Notes: 1) I normalize test and training sets (individually for each sample) separately.

2) I tried cross validation but the performance is same

3) I used both SVM linear and RBF Kernels

4) I tried without normalizing too. But same poor results

5) I have same number of positive and negative datasets ( 400 each) and 34 samples of positive and 1000+ samples of negative test sets.

Was it helpful?

Solution

If you're training on balanced data the fact that "it predicts all training sets as positive" is probably enough to conclude that something has gone wrong.

Try building something very simple (e.g. a linear SVM with one or two features) and look at the model as well as a visualization of your training data; follow the scikit-learn example: http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html

There's also a possibility that your input data has many large outliers impacting the transform process...

OTHER TIPS

Try doing feature selection on the training data (Seperately from your test/validation data). Feature selection on your whole dataset can easily lead to overfitting.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top