Pregunta

I'm trying to figure out which decision tree method from scikit-learn package will better suit my needs for performing classification task.

However, I found that there are two decision tree models available there:

  • standard DecisionTreeClassifier based on optimized CART algorithm from scikit.tree package.
  • ensemble method ExtraTreeClassifier from scikit.ensemble package.

Can anyone specify the advantages and disadvatages of using each of these models?

¿Fue útil?

Solución

ExtraTreeClassifier is an extremely randomized version of DecisionTreeClassifier meant to be used internally as part of the ExtraTreesClassifier ensemble.

Averaging ensembles such as a RandomForestClassifier and ExtraTreesClassifier are meant to tackle the variance problems (lack of robustness with respect to small changes in the training set) of individual DecisionTreeClassifier instances.

If your main goal is maximizing prediction accuracy you should almost always use an ensemble of decision trees such as ExtraTreesClassifier (or alternatively a boosting ensemble) instead of training individual decision trees.

Have a look at the original Extra Trees paper for more details.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top