Cross-validation and Random forests

https://stackoverflow.com/questions/22346203

13-06-2023
|

Question

I'm using Random forests to predict labels in my dataset. My question is: Does it make sense to do a 10-fold cross-validation using random forest? Intuitively I can say that Random forests do cross-validation on their own-So would there be any benefit to doing cross-validation and building a random forest classifier in each split?

Solution

In fact you do cross-validation to assert the choice of your model (e.g. compare two RF with different k). That is not really the same thing as what RF is doing in terms of learning different trees on your learning set.

In practice you'd only do k-fold CV when your training set is small and you can't afford to divide it into training/validation.

If your dataset is small it can be a good thing to assert it with kf-CV but otherwise I'd just tune my parameters (to avoid overfitting, get a better accuracy) with a separate validation set (something like 20% of your LS).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow