Question

Right now,I am working on decision trees on python,how do I know what would be the best pruning criteria based on my data?

Was it helpful?

Solution

Experimentally: using cross-validation on a subset of your training data, compute the performance of every option that you want to consider. Then select the best option and train the final model using this option.


// different settings for hyper-parameters, 
// for instance different pruning criteria:
hpSet = { hp1, hp2, ...}  

trainSet, testSet = split(data)

for each hp in hpSet:
    // run cross-validation over 'train' using hyper-parameter 'hp' 
    // and store resulting performance
    perf[hp] = runCV(k, trainSet, hp)

bestHP = pick maximum hp in 'perf'
model = train(trainSet, bestHP)
perf = test(model, testSet)
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top