Question

Should I include the validation file in the training process after finishing the tuning process (e.g. searching for params using the validation file)?

Was it helpful?

Solution

It depends on the distribution of the train, valid and holdout/test set.

There are a couple of possibilities (basically permutations). In general any different distribution=covariate shift is bad and you should repair it. If this is the case, including valid is the least of your problems (but you should include it in this case to make corrections) and you should worry about covarite shift.

If distributions are the same between the sets, it wont make any negative difference and it could only help if you add the valid-hyperparam tuning dataset to the train.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top