Question

When selecting a probability threshold to maximize the F1 score prior to deploying a model (based on the precision-recall curve), should the threshold be selected based on the training or holdout dataset?

Was it helpful?

Solution

Ideally, the threshold should be selected on your training set. Your holdout set is just there to double confirm that whatever has worked on your training set will generalize to images outside of the training set.

This is the reason why hyperparameters tuning like GridSearch and RandomizedSearch in python has a cv parameter to cross-validate between different folds of your training set instead of allowing to choose the best parameters based on metric measured using the holdout set.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top