Combining Weak Learners into a Strong Classifier

https://stackoverflow.com/questions/12150810

28-06-2021
|

Question

How do I combine few weak learners into a strong classifier? I know the formula, but the problem is that in every paper about AdaBoost that I've read there are only formulas without any example. I mean - I got weak learners and their weights, so I can do what the formula tells me to do (multiply learner by its weight and add another one multiplied by its weight and another one etc.) but how exactly do I do that? My weak learners are decision stumps. They got attribute and treshold, so what do I multiply?

Solution

If I understand your question correctly, you have a great explanation on how boosting ensambles the weak classifiers into a strong classifier with a lot of images in these lecture notes:

www.csc.kth.se/utbildning/kth/kurser/DD2427/bik12/DownloadMaterial/Lectures/Lecture8.pdf

Basically you are by taking the weighted combination of the separating hyperplanes creating a more complex decision surface (great plots showing this in the lecture notes)

Hope this helps.

EDIT

To do it practically:

in page 42 you see the formulae for alpha_t = 1/2*ln((1-e_t)/e_t) which easily can be calculated in a for loop, or if you are using some numeric library (I'm using numpy which is really great) directly by vector operations. The alpha_t is calculated inside of the adaboost so I assume you already have these.

You have the mathematical formulae at page 38, the big sigma stands for sum over all. h_t is the weak classifier function and it returns either -1 (no) or 1 (yes). alpha_t is basically how good the weak classifier is and thus how much it has to say in the final decision of the strong classifier (not very democratic).

I don't really use forloops never, but I'll be easier to understand and more language independent (this is pythonish pseudocode):

strongclassifier(x):
    response=0
    for t in T: #over all weakclassifiers indices
        response += alpha[t]*h[t](x)
    return sign(response)

This is mathematically called the dot product between the weights and the weak-responses (basically: strong(x) = alpha*weak(x)).

http://en.wikipedia.org/wiki/Dot_product

EDIT2

This is what is happening inside strongclassifier(x): Separating hyperplane is basically decided in the function weak(x), so all x's which has weak(x)=1 is on one side of the hyperplane while weak(x)=-1 is on the other side of the hyperplane. If you think about it has lines on a plane you have a plane separating the plane into two parts (always), one side is (-) and the other one is (+). If you now have 3 infinite lines in the shape of a triangle with their negative side faced outwards, you will get 3 (+)'s inside the triangle and 1 or 2 (-)'s outside which results (in the strong classifier) into a triangle region which is positive and the rest negative. It's an over simplification but the point is still there and it works totally analogous in higher dimensions.

OTHER TIPS

In vanilla Ada Boost, you don't multiply learners by any weight. Instead, you increase the weight of the misclassified data. Imagine that you have an array, such as [1..1000], and you want to use neural networks to estimate which numbers are primes. (Stupid example, but suffices for demonstration).

Imagine that you have class NeuralNet. You instantiate the first one, n1 = NeuralNet.new. Then you have the training set, that is, another array of primes from 1 to 1000. (You need to make up some feature set for a number, such as its digits.). Then you train n1 to recognize primes on your training set. Let's imagine that n1 is weak, so after the training period ends, it won't be able to correctly classify all numbers 1..1000 into primes and non-primes. Let's imagine that n1 incorrectly says that 27 is prime and 113 is non-prime, and makes some other mistakes. What do you do? You instantiate another NeuralNet, n2, and increase the weight of 27, 113 and other mistaken numbers, let's say, from 1 to 1.5, and decrease the weight of the correctly classified numbers from 1 to 0.667. Then you train n2. After training, you'll find that n2 corrected most of the mistakes of n1, including no. 27, but no. 113 is still misclassified. So you instantiate n3, increase the weight of 113 to 2, decrease the weight of 27 and other now correctly classified numbers to 1, and decrease the weight of old correctly classified numbers to 0.5. And so on...

Am I being concrete enough?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow