Question

I have a very random population I'm trying to split using binary decision tree.

Population probability
TRUE 51%
FALSE 49%

So the entropy is 1 (rounded to 3). So for any feature the entropy will also be 1 (the same), and thus no information gain.

Am I doing this right? In my process to learn it I haven't come across anything saying that entropy is useless for 2 classes

Was it helpful?

Solution

The entropy/information gain doesn't so much depend on the distribution of the classes, but on the information contained in the features that are used to characterise the instances in your data set. If, for example, you had a feature that was always 1 for the TRUE class and always 2 for the FALSE class, it would have the highest information gain because it allows you to separate these two classes perfectly.

If the information gain you're getting is very small, it indicates that the information contained in the features is not useful for separating your classes. In this case, you need to find more informative features.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top