Question

I am trying to implement AdaBoost algorithm, and have two questions.

1) At each iteration, the training data has to be re-sampled in accordance with a probability distribution. Should the size of re-sampled data set be the same as the one of original data set. 2) If I re-sample the training data set according to a probability distribution, it is quite possible that I can get multiple copies for a single data point. Should I keep all of these redundant copies while training the weak classifier at each iteration.

Was it helpful?

Solution

1) You don't need to actually re-sample the dataset, it is enough to just weigh the datapoints in the training of the classifier, i.e., the objective function of the weak classifier should be weighted.

If the sizes of the datasets are large enough, you can probably also use sampling and the size of the dataset you sample doesn't matter per se.

2) If you do use sampling and get redundant copies, you definitely should keep them as otherwise your objective function for the weak classifier will not be correct.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top