What can I do with a Decision Tree with poor ROC

https://datascience.stackexchange.com/questions/10626

16-10-2019
|

Question

Let's say I do a Decision Tree analysis.

But the performance characteristics are nothing great (e.g. ROC is nothing great).

Is there anything I can do with this "not so great" tree. Or do I typically need to trash it and try something else (either a new data set or a new analysis on the same data)?

Solution

Decision trees has one big quality and one big drawback. Their big quality is that they are what is known as glass-box models. What I mean by that is that they expose what they learn in a very clear and intuitive way. The name comes from the fact that you can see through a glass box. So, because of that, decision trees are very useful for analysis, they are a nice support to understand the relations between variables, which variables are important, in which way they are important. And even if they does not provide crystal clear information, they can bring you ideas about that. This might be very helpful especially if you have domain expert knowledge and you can put things together in a meaningful manner.

Their main drawback is their high variability. This problem is mainly caused by their greedy approach. Each decision in the first level nodes shape differently the tree. You can even go further and see that a single additional data point is enough in many cases to get a totally different tree, especially if the sample is small or if the data is noisy.

There are two types of approaches to solve this issue. The first type of approach tries to improve the single tree you built. This kind of approaches are known as pruning. A simple example would be reduced error pruning. This approach is simple and produce good results. You train your tree on a sample of data. Then you take another sample of data, you fit the new data on the tree, and then evaluate again the nodes of the tree from the perspective of the new data. If a non-leaf node get at least the same error if would not be split than if it would be split, then you can decide to cut the child nodes and transform that node into a leaf node. There are however much nicer pruning strategies, which are stronger, perhaps based on cross validation or some other criteria, mostly statistical based. Notice however that for reduced error pruning you need additional data, or to split your original sample in two, one for training, the other for pruning. If you go further to estimate the prediction error you need a third sample.

The second approach would be either to build multiple times some trees and chose some based on cross validation, bootstrapping or whatever method you you, or use a tree ensembles like bagging or boosting algorithms. Note that for boosting and bagging you loose glass box property.

Ultimately you have to choose between understanding and performance, having as a decent compromise the pruning procedure.

OTHER TIPS

Two things can be done.

One is apply ensemble methods like random forest, Adaboost, etc..
Second one is try to optimise your decision tree and find the best parameters using techniques like GridSearchCV.

Depending on the software / language you are using, you should be able to implement some form of:

Boosting - Wikipedia - (AdaBoost) - Reweights incorrect predictions and re-evaluates the tree
Bagging - Wikipedia - builds multiple decision trees by repeatedly resampling training data with replacement, and voting the trees for a consensus prediction.
Pruning - Wikipedia - Decision tree algorithms can create overly complex trees with too many branches and nodes that do not generalize well. This is known as overfitting. Pruning removes the extra nodes that do not provide any valuable information.

To be short: yes, trash it. Any ML model you use has one job - to predict. If it can't do that job then why on earth would you ever use it? The whole impetus for using ML algorithms is to predict better than guessing randomly. Ideally, much, much better than guessing randomly.

Understanding why a model makes a decision has some value, sure, but only insofar as that decision has value to the end user/solves a business problem.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange