High accuracy on test-set, what could go wrong?

https://datascience.stackexchange.com/questions/86827

17-12-2020
|

Question

You are given a pre-trained binary ML classification model with 99% accuracy on the test-set (assume the customer required 95% and that the test-set is balanced). We would like to deploy our model in production. What could go wrong? How would you check it?

My answer was that the model could be biased towards our test set and could fail to generalize on yet to be seen data. We can check this by running the model on multiple unrelated test-set that it haven't seen before.

Is this the right angle?

No correct solution

OTHER TIPS

There are many possible ways to make errors giving you a really huge score on test in Data Science. Here are a few examples :

Your test set is also in your train set : Imagine you have data from 2010 to 2020, and use 2019 as test. If you trained on all 2010-2020 without excluding 2019, then you'll test on a data well known by the model, since it was used to train it. Moreover, if the model tends to overfit (so fits "too perfectly and precisely" with training set), you could achieve a 99% accuracy
Data leakage : This is a phenomenom in which your test set contains info that you shouldn't have in real new cases. Example : you're using Titanic dataset, predicting who dies and who survives. Imagine now the dataset has an attribute called "Hour of death", empty if the person survived, and filled with an hour if he died. Then your model will just learn "if this attribute is empty then the person survive, else he died". On your test set, you'll apply your model, knowing this info that you shouldn't know if you had to predict true new cases

Wathever happens, a 99% accuracy have to make you wonder and look for errors, this is almost impossible to achieve unless your problem is REALLY easy (and might not need a Data Science model at all)

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange