How do I know the best pruning criteria for decision trees?
-
09-12-2020 - |
Question
Right now,I am working on decision trees on python,how do I know what would be the best pruning criteria based on my data?
Solution
Experimentally: using cross-validation on a subset of your training data, compute the performance of every option that you want to consider. Then select the best option and train the final model using this option.
// different settings for hyper-parameters,
// for instance different pruning criteria:
hpSet = { hp1, hp2, ...}
trainSet, testSet = split(data)
for each hp in hpSet:
// run cross-validation over 'train' using hyper-parameter 'hp'
// and store resulting performance
perf[hp] = runCV(k, trainSet, hp)
bestHP = pick maximum hp in 'perf'
model = train(trainSet, bestHP)
perf = test(model, testSet)
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange