Question

I am using sklearn to train some models (random forest, decision tree). For the training I am using RandomsearchCV with Stratified k-fold as cross-validation. Then I make a predictions on the test set and calculate the test score.

However, I would like to compare the test score with the training score. I assumed I could use the mean_train_score of the cv_results_ report from the RandomseachCV as training score for the model, because I thought it would show the validation against the hould-out-fold from the k-folds. However, I am not sure about this because there is also a mean_test_score.

I was looking for an explanation of the mean_train_score and mean_test_score. I know these scores exits also for the single folds. But how are these scores calculated? And is one of them my training score, which shows how my model during the training performed?

I found an approach of explanation, but it's too superficial for me: GridSearch mean_test_score vs mean_train_score

Was it helpful?

Solution

The mean_test_score is actually the mean score of the validation step for each fold. The "test" word is probably not well chosen by sklearn in that case, if you want to make the distinction between validation and test.

However, I don't think you are totally finished here, and you should therefore not compare mean_train_score with the test core.

Indeed, the cross-validation phase gives you the best set of hyperparameters (case with maximal "test" score, which is actually a validation score), but you should not keep the corresponding model, especially (but not only) if the training set has few observations or there are few folds.
You should instead re-train your model a last time, with this set of hyperparameters, but over the entire train set (not only the $\frac{k-1}{k}$ cases used during cross-validation). This trained model will give you the training score (over the whole training set) and the test score (over the test set).

OTHER TIPS

The mystery retains inside working of K-fold cross validation, which actually divides the whole data into train and test data K number of times in a specific ratio. For more about it's working please refer here the mean score calculated on the training set is mean_train_score and on the testing set is mean_test_score.

According to your question your training score should be: mean_train_error

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top