MLPClassifier threshold factor to eliminate test samples that are not in match with train data

https://datascience.stackexchange.com/questions/28758

31-10-2019
|

Question

I am using MLPClassifer example from scikit-learn

The code for training:

from sklearn.neural_network import MLPClassifier
X = [[0., 0.], [1., 1.]]
y = [0, 1]
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(5, 2), random_state=1)

clf.fit(X, y)

At the predict step, we use test data [2., 2.], [-1., -2.] in clf.predict([[2., 2.], [-1., -2.]]). The output of this function is array([1, 0])

As we observe, the test data [2.,2.] is not in the train dataset we passed. Still, we got the closest match as label 1.

What i am trying to find is if the test data i supplied is not in the train dataset, i should print a message to user that data is not valid instead of telling him the wrong label as 1.

For instance, in knn classification, i have kneighbours function which tells the distance of my closest neighbours to the test data i supplied in a 0 to 1 scale. So, i could easily eliminate the test data samples which are highly distant from my train data samples by keeping threshold at 0.6 or 0.7.

Is there any criteria/threshold like this i could do with MLPClassifier or with any one of Incremental Classifiers mentioned here which can restrict my test samples if not present in train dataset ?

Question migrated from SO

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange