質問

I'm implementing decision tree based on CART algorithm and I have a question. Now I can classify data, but my task is not only classify data. I want have a probability of right classification in end nodes. For example. I have dataset that contains data of classes A and B. When I put an instance of some class to my tree I want see with what probability the instance belongs to class A and class B. How can I do that? How can I improve CART to have probability distribution in the end nodes?

役に立ちましたか?

解決

When you train your tree using the training data set, every time you do a split on your data, the left and right node will end up with a certain proportion of instances from class A and class B. The percentage of instances of class A (or class B) can be interpreted as probability.

For example, assume your training data set includes 50 items from class A and 50 items from class B. You build a tree of one level, by splitting the data once. Assume after the split, your left node ends up having 40 instances of class A and 10 instances of class B and the right node has 10 instances of class A and 40 instances of class B. Now the probabilities in the nodes will be 40/(10+40) = 80% for class A in left node, and 10/(10+40) = 20% for class A in left node (and vice versa for class B).

Exactly the same applies for deeper trees: you count the instances of classes and compute the proportion.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top