I built a neural network to predict a certain kind of data ( biological sequences ). It has 32 features where 12 have certain units and 20 are simply integers ( but positive). My positive set has 648 samples and negatives 9000+ samples.
To train my network I took 500 samples of both and rest were used for testing. When I trained and tested my network with 3 folds cross-validation it gave 100 % accuracy for all cases, provided I normalised the input data before partitioning them into training and testing sets. Precision and Recall is 100%
When I don't normalise it the accuracy falls to 65-70 % for the same experiment. Precision and recall is 5% and 80% respectively.
The case has become more peculiar. When I use the network trained in first ( normalised one) model to test on several random datasets which were present in the training sets, without normalising ( as outer world data can not be normalised because we deal with single instances) it predicts all samples as 1 or positives, completely biased to positives.
When I use the second model ( the unnormalised one) it predicts more false negatives.
If 'outp' is the output prediction of training set positives and 'outn' is the output prediction of training set negatives, I calculated threshold for my network as :
[ mean(outp) - std_dev(outp) + mean( outn) + std_dev(outn)] / 2
I got 0.5 for the first model and for second model is 0.489
1) Where is the problem ? Can someone explain me that.
2) When we train, it is recommended to normalise the data but doesn't it mean that the classifier will mis-interprete the input values if provided by a user who is going to use the prediction tool, because a single sample can not be normalised ?
3) Also what is the best method to find threshold in such problems or say classifier problems in general ?
4) What else information I should provide I don't know. Please let me know that too.
I am providing link to the epoch to error plots.
https://www.dropbox.com/s/1gideuvbeje2lip/model2_unnormalised.jpg
https://www.dropbox.com/s/nb4zyt3h370pk8m/model1_normalised.jpg
One more thing I would like to mention, to normalize I used the MATLAB's built in function
My positive matrix is 32 features by 648 samples ( i.e 32 x 648 )
and negative matrix is 32 features by 9014 samples ( i.e 32 x 9014 )
both were normalized using initially before any partitioning as train or test or validate sets by normr
function of MATLAB.