Question

I am trying to implement a classifier using decision trees (and more precisely the ID3 algorithm). My training data contain the attribute age which is a continuous value. I am trying to implement the BestSplit() method where I need to split the data into k partitions, where k = the number of possible values for each of the features! I am stuck though cause I know it would be impossible to split the data taking into account all these possibilities of different ages. This is why I need to create group ages! How do I know though how to decide from within all these possibilities of group sets?

Était-ce utile?

La solution

The solution is to find the maximum gain for these splits and pick the one with the highest score. To do this you will also need to compute the entropy. the following answer explains perfectly how it works. What is "entropy and information gain"?

Autres conseils

The ID3 algorithm based on Occam's razor, which is an import theory in many subjects. Entropy and Information gain is the typical method to choose what the best feature is to split a dataset. You could see an example and some analysis in this blog: My blog

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top