Pergunta

For hyperparameter optimization I see two approaches:

  1. Splitting the dataset into train, validation and test, and optimize the hyperparameters based on the results of training on the train dataset and evaluating on the validation dataset, leaving the test set untouched for final performance estimation.

  2. Splitting the dataset into train and test, and optimize the hyperparameters using crossvalidation on the train set, leaving the test set untouched for final performance estimation.

So which approach is better?

Foi útil?

Solução

The $k$-fold cross-validation (CV) process (method 2) actually does the same thing as method 1, but it repeats the steps on the training and validation sets $k$ times. So with CV the performance is averaged across the $k$ runs before selecting the best hyper-parameter values. This makes the performance and value selection more reliable in general, since there is less risk to obtain the best result by chance. However it takes much longer (since repeating $k$ times), so if the training process is long it's not always practical to use CV.

Licenciado em: CC-BY-SA com atribuição
scroll top