Why is it bad to use the same test dataset over and over again?

https://datascience.stackexchange.com/questions/35767

machine-learning
generalization

31-10-2019
|

Question

I am following this Google's series: Machine Learning Crash Course.

On the chapter about generalisation, they make the following statement:

Good performance on the test set is a useful indicator of good performance on the new data in general, assuming that:

- The test set is large enough.

- You don't cheat by using the same test set over and over.

Why exactly is the second point a bad one? As long as one does not use the test set for the training stage, why is it bad to keep using the same test set to test a model's performance? It's not like the model will get a bias by doing so (the test set is not updating any of the model's parameters internally).

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange