Voting combined results from different classifiers gave bad accuracy

https://datascience.stackexchange.com/questions/8339

16-10-2019
|

Question

I used following classifiers along with their accuracies:

Random forest - 85 %
SVM - 78 %
Adaboost - 82%
Logistic regression - 80%

When I used voting from above classifiers for final classification, I got lesser accuracy than the case when I used Random forest alone.

How is this possible? All classifiers are giving more or less same accuracies when used individually, then how does Random Forest outperform their combined result ?

Solution

The approach you are considering is similar to a multi-class SVM or a one-vs-the-rest approach.

And here is how I describe the problem. The support vector machine, per example, is fundamentally a two-class classifier.

In practice, however, we often have to tackle problems involving K > 2 classes. Various methods have therefore been proposed for combining multiple two-class SVMs in order to build a multi-class classifier.

One commonly used approach (Vapnik, 1998) is to construct K separate SVMs, in which the kth model y_k(x) is trained using the data from class C_k as the positive examples and the data from the remaining K − 1 classes as the negative examples. This is known as the one-versus-the-rest approach where :

y(x) = max_k y_k(x)

Unfortunately, this heuristic approach suffers from the problem that the different classifiers were trained on different tasks, and there is no guarantee that the real-valued quantities y_k(x) for different classifiers will have appropriate scales.

Another problem with the one-versus-the-rest approach is that the training sets are imbalanced. For instance, if we have ten classes each with equal numbers of training data points, then the individual classifiers are trained on data sets comprising 90% negative examples and only 10% positive examples, and the symmetry of the original problem is lost.

Therefor, you got your bad accuracy.

PS: Accuracy, in most cases, is not a good measure for evaluating a classifier model.

References :

Vapnik, V. - Statistical Learning Theory. Wiley-Interscience, New York.
Christopher M. Bishop - Pattern Recognition and Machine Learning.

OTHER TIPS

Simply combining by voting some classifiers can naturally give bag results. Consider a toy example like having a set of data with $100$ instances. Suppose you have $3$ classifiers. Let's say on first $70$ instances all classifiers match perfectly. Than on next $10$ first classifier is good, the others are bad, on the next $10$, the 2nd is good, others bad, and on the last $10$ the 3rd is good, others are bad. All three models goes on with $0.80$ accuracy. Voting them would lead to $0.70$.

There are also some other potential difficulties like unscaled scores for base learners (it often happen with SVM, if it is not scaled by a logistic or whatever). Another reason is the model family provided by different learners. Bagging (or bootstrap aggregating which is used in RF for example) work simply because they assume that the model comes from the same family (it is the same tree model, or statistical speaking it is the majority class over a region), and only the samples are drawn randomly. Thus bootstrapping which requires independent and identically distributed sub-samples does not work in your case, since your models (which can be considered functions over samples) are not identically distributed.

There are however solutions to this problem, one of them is stacked generalization (or simply stacking). The simplest way is stacking with a logistic regressor. The idea is to train some base classifiers (which you already have), and take their results as input variables for a logistic regression. The top logistic regression would be able to find the 'proper balance' between the predictions and usually you get some increased accuracy as a benefit among others.

For further references see: Wikipedia page on ensemble learning. Search on Google for 'stacked generalization'. For papers start with the seminal paper of David Wolpert - Stacked Generalization - 1992 which started all the discussion on this topic.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange