how to use weight when training a weak learner for adaboost

Question 1

I am not very clear about how to use the weights. Should I resample the training data?

It depends on what classifier you are using.

If your classifier can take instance weight (weighted training examples) into account, then you don't need to resample the data. An example classifier could be naive bayes classifier that accumulates weighted counts or a weighted k-nearest-neighbor classifier.

Otherwise, you want to resample the data using the instance weight, i.e., those instance with more weights could be sampled multiple times; while those instance with little weight might not even appear in the training data. Most of the other classifiers fall in this category.

In Practice

Actually in practice, boosting performs better if you only rely on a pool of very naive classifiers, e.g., decision stump, linear discriminant. In this case, the algorithm you listed has a easy-to-implement form (see here for details): enter image description here Where alpha is chosen by (epsilon is defined similarly as yours).

enter image description here

An Example

Define a two-class problem in the plane (for example, a circle of points inside a square) and build a strong classier out of a pool of randomly generated linear discriminants of the type sign(ax1 + bx2 + c).

The two class labels are represented with red crosses and blue dots. We here are using a bunch of linear discriminants (yellow lines) to construct the pool of naive/weak classifiers. We generate 1000 data points for each class in the graph (inside the circle or not) and 20% of data is reserved for testing.

enter image description here

This is the classification result (in the test dataset) I got, in which I used 50 linear discriminants. The training error is 1.45% and the testing error is 2.3%

enter image description here

Question 2

The weights are the values applied to each example (sample) in step 2. These weights are then updated at step 3.3 (wi).

So initially all weights are equal (step 2) and they are increased for wrongly classified data and decreased for correctly classified data. So in step 3.1 you have to take take these value in account to determine a new classifier, giving more importance to higher weight values. If you did not change the weight you would produce exactly the same classifier each time you execute step 3.1.

These weights are only used for training purpose, they're not part of the final model.