Question

I am trying to implement a classifier using decision trees (and more precisely the ID3 algorithm). My training data contain the attribute age which is a continuous value. I am trying to implement the BestSplit() method where I need to split the data into k partitions, where k = the number of possible values for each of the features! I am stuck though cause I know it would be impossible to split the data taking into account all these possibilities of different ages. This is why I need to create group ages! How do I know though how to decide from within all these possibilities of group sets?

Was it helpful?

Solution

The solution is to find the maximum gain for these splits and pick the one with the highest score. To do this you will also need to compute the entropy. the following answer explains perfectly how it works. What is "entropy and information gain"?

OTHER TIPS

The ID3 algorithm based on Occam's razor, which is an import theory in many subjects. Entropy and Information gain is the typical method to choose what the best feature is to split a dataset. You could see an example and some analysis in this blog: My blog

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top