MLPClassifier threshold factor to eliminate test samples that are not in match with train data
-
31-10-2019 - |
Question
I am using MLPClassifer example from scikit-learn
The code for training:
from sklearn.neural_network import MLPClassifier
X = [[0., 0.], [1., 1.]]
y = [0, 1]
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)
At the predict step, we use test data [2., 2.], [-1., -2.] in
clf.predict([[2., 2.], [-1., -2.]])
. The output of this function is
array([1, 0])
As we observe, the test data [2.,2.] is not in the train dataset we passed. Still, we got the closest match as label 1.
What i am trying to find is if the test data i supplied is not in the train dataset, i should print a message to user that data is not valid instead of telling him the wrong label as 1.
For instance, in knn classification, i have kneighbours function which tells the distance of my closest neighbours to the test data i supplied in a 0 to 1 scale. So, i could easily eliminate the test data samples which are highly distant from my train data samples by keeping threshold at 0.6 or 0.7.
Is there any criteria/threshold like this i could do with MLPClassifier or with any one of Incremental Classifiers mentioned here which can restrict my test samples if not present in train dataset ?
Question migrated from SO
No correct solution