Decision tree, how to understand or calculate the probability/confidence of prediction result

datascience.stackexchange https://datascience.stackexchange.com/questions/11171

  •  16-10-2019
  •  | 
  •  

Question

For example, a drug prediction problem using a decision tree. I trained the decision tree model and would like to predict using new data.

For example:

patient, Attr1, Attr2, Attr3, .., Label
002      90.0   8.0    98.0 ...   ? ===> predict drug A

How can I calculate the confidence or probability of the prediction result of drug A?

Was it helpful?

Solution

What data mining package do you use?

In sklearn, the DecisionTreeClassifier can give you probabilities, but you have to use things like max_depth in order to truncate the tree. The probabilities that it returns is $P=n_A/(n_A+n_B)$, that is, the number of observations of class A that have been "captured" by that leaf over the entire number of observations captured by that leaf (during training). But again, you must prune or truncate your decision tree, because otherwise the decision tree grows until $n=1$ in each leaf and so $P=1$.

That being said, I think you want to use something like a random forest. In a random forest, multiple decision trees are trained, by using different resamples of your data. In the end, probabilities can be calculated by the proportion of decision trees which vote for each class. This I think is a much more robust approach to estimate probabilities than using individual decision trees.

But random forests are not interpretable, so if interpertability is a requirement, use the decision tree like I mentioned. You can use grid search to maximize the ROC AUC score by changing hyperparameters such as maximum depth to find whatever decision tree gives the most reliable probabilities.

EDIT: In case I wasn't clear enough, I think it's awful to use a single decision tree to predict probabilities. I have extended my answer in a blog post.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top