Question

I analyzed a small dataset which had three features, so I kept max_depth of decision tree to be 3, in doing so I found it something intresting, there was a leaf node which had number of samples of both classes to be equal and decision tree choose one class, now I am intrested to know how class is decided in such scenario, is it random or some other criteria, I have attached image to explain my scenarioDecision Tree Model

Was it helpful?

Solution

This is an implementation detail, and I wouldn't necessarily rely on this behavior, but presently in sklearn, it will choose the "first" class.

The predict method calls for the probability prediction, then takes the argmax, which in case of ties takes the first one:
https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/tree/_classes.py#L403
https://numpy.org/doc/stable/reference/generated/numpy.argmax.html

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top