Question

I am currently testing out a few different ensemble methods on my dataset. I've heard that you can also use support vector machines as base learners in boosting and bagging methods but I am not sure which methods allow it or not. In particular, e.g. for XGB i tried out trees and SVMs as base learners and got the exact same result for 5 different performance metrics which made me question the results and/or that the option can only take trees as base learners. I didn't find much info in the documentation or at least not in all of the documentations. I would be interested about AdaBoostClassifier(), BaggingClassifier() and XGBClassifier(). Does anybody know the details and whether or not I can use SVMs here as base learners?

Was it helpful?

Solution

In short: Yes.

Conceptually, bagging and boosting are model-agnostic techniques, meaning that they work regardless of the learner.

Bagging essentially is the following:

  • create multiple predictors (they can even be hard-coded!)
  • gather predictions from the learners and come up with a prediction

Boosting can be seen as:

  • train a predictor
  • find where the predictor makes mistakes
  • put more emphasis on these mistakes
  • repeat until satisfactory

Regarding the specific Sklearn implementations, here are the base learners that you can use:

  • AdaBoostClassifier()

The documentation says Support for sample weighting is required, as well as proper classes_ and n_classes_ attributes.

This means that you can use all models that can give weight to your samples as part of the learning process (KNN, SVM, etc.)

  • BaggingClassifier()

This is a simple bagging strategy, so all estimators can be used here.

  • GradientBoostingClassifier()

This requires that your learners are differentiable so that the gradient can be computed. Generally, this technique is specific for tree learning.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top