validation croisée pour l'algorithme C5.0

https://datascience.stackexchange.com/questions/6304

16-10-2019
|

Question

Je veux essayer K fois la validation croisée en R pour l'algorithme C5.0,

Ce qui suit est le code que je l'utilise. Quelqu'un peut-il me suggérer comment puis-je inclure k fois ainsi?

Classifi_C5.0 <-. C5.0 (TARGET ~,, data = training_data_SMOTED, pistes = 500, control = C5.0Control (MINCASES = mincases_count, noGlobalPruning = FALSE))

Est-il nécessaire de faire k fois la validation croisée pour la forêt aléatoire?

La solution

I would say cross validation is unnecessary here since the multiple partitioning of the data and variables is already implicit in Random Forests. But it's still a good practice to hold out a testing set that is distinct from the training set. This is mostly because you may introduce changes in your random forest to improve the performance on the test sets overall, thereby introducing the bias that the random forests are trying to overcome. So if you withheld a portion of your data and judged the final performance of the RF on that withheld set only in the predict step, then it's fine.

Licencié sous: CC-BY-SA avec attribution

Non affilié à datascience.stackexchange