how to choose classifer

https://datascience.stackexchange.com/questions/12263

16-10-2019
|

Question

is the best way to create the most accurate classifier to train a bunch of classifying algorithms like ANN, SVM, KNN, etc, and test different parameters to get optimal parameters for each classifier, and see which classifier has the least testing error?

Or is it better to use ensemble method and choose the "majority" decision of different kinds of trained classifiers?

Solution

It's usually not that clear cut; there's typically not one universally best approach.

Having said that, there are some prototyped ensemble approaches that are supposed to always be better than their underlying component algorithms, notably Erin LeDell's binary ensemble classifier for H2O. However, even in those cases you still need to optimize the first stage algorithms for the ensemble to be universally better.

Thus if you're willing to spend a lot of extra time, let's say 2 weeks for an ensemble instead of the 1 week it might take you for your single-stage algorithm, then it's possible (especially for binary classification) to find an ensemble that will definitely be better than your single-stage classifier.

However this is rarely the case and the way you framed the question implies there's a choice between

building 1 really good single-stage model, selected from many candidate models (and by the way be sure to avoid overfitting while making those selections) and
throwing an ensemble at the problem without completing #1 above for each component of the ensemble (or completing #1 but not also optimizing the 2nd stage of the ensemble)

If that's the decision then -- while there's no 1 universally right answer -- I'd say that in the vast majority of the cases it's better to stick with #1.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange