While downsampling training data should we also downsample the validation data or retain validation split as it is?

datascience.stackexchange https://datascience.stackexchange.com/questions/74386

  •  11-12-2020
  •  | 
  •  

Question

I am dealing with class imbalance problem. In this case, I am down sampling the majority class lables in the training set.

Among training, validation and test splits, the majority class in training split is down-sampled, and test split is retained as it is. However, should the validation split be downsampled according to the training-set or should it be retained as it is?

This is because the validation set controls the training process.

Was it helpful?

Solution

I would recommend not to downsample the validation set. In the end you care about performance on the test set with the skewed class distribution. Therefore your validation set (used for hyperparameter selection, early stopping etc.) should have the same distribution in my opinion.

Have you considered upsampling the minority class? By downsampling you loose training data, which might contain valuable information and therefore might harm the learning process.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top