Cross validation for C5.0 algorithm

https://datascience.stackexchange.com/questions/6304

16-10-2019
|

Question

I want to try K-fold cross validation in R for C5.0 algorithm,

The following is the code i use. Can someone suggest me how can i include k-fold as well?

Classifi_C5.0 <- C5.0(TARGET ~., , data = training_data_SMOTED, trails = 500, control = C5.0Control(minCases = mincases_count, noGlobalPruning = FALSE))

Is it required to do k-fold cross validation for Random forest?

Solution

I would say cross validation is unnecessary here since the multiple partitioning of the data and variables is already implicit in Random Forests. But it's still a good practice to hold out a testing set that is distinct from the training set. This is mostly because you may introduce changes in your random forest to improve the performance on the test sets overall, thereby introducing the bias that the random forests are trying to overcome. So if you withheld a portion of your data and judged the final performance of the RF on that withheld set only in the predict step, then it's fine.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange