Viola Jones AdaBoost running out of memory before even starts

https://stackoverflow.com/questions/13755484

05-12-2021
|

Question

I'm implementing the Viola Jones algorithm for face detection. I'm having issues with the first part of the AdaBoost learning part of the algorithm.

The original paper states

The weak classiﬁer selection algorithm proceeds as follows. For each feature, the examples are sorted based on feature value.

I'm currently working with a relatively small training set of 2000 positive images and 1000 negative images. The paper describes having data sets as large as 10,000.

The main purpose of AdaBoost is to decrease the number of features in a 24x24 window, which totals 160,000+. The algorithm works on these features and selects the best ones.

The paper describes that for each feature, it calculates its value on each image, and then sorts them based on value. What this means is I need to make a container for each feature and store the values of all the samples.

My problem is my program runs out of memory after evaluating only 10,000 of the features (only 6% of them). The overall size of all the containers will end up being 160,000*3000, which is in the billions. How am I supposed to implement this algorithm without running out of memory? I've increased the heap size, and it got me from 3% to 6%, I don't think increasing it much more will work.

The paper implies that these sorted values are needed throughout the algorithm, so I can't discard them after each feature.

Here's my code so far

public static List<WeakClassifier> train(List<Image> positiveSamples, List<Image> negativeSamples, List<Feature> allFeatures, int T) {
    List<WeakClassifier> solution = new LinkedList<WeakClassifier>();

    // Initialize Weights for each sample, whether positive or negative
    float[] positiveWeights = new float[positiveSamples.size()];
    float[] negativeWeights = new float[negativeSamples.size()];

    float initialPositiveWeight = 0.5f / positiveWeights.length;
    float initialNegativeWeight = 0.5f / negativeWeights.length;

    for (int i = 0; i < positiveWeights.length; ++i) {
        positiveWeights[i] = initialPositiveWeight;
    }
    for (int i = 0; i < negativeWeights.length; ++i) {
        negativeWeights[i] = initialNegativeWeight;
    }

    // Each feature's value for each image
    List<List<FeatureValue>> featureValues = new LinkedList<List<FeatureValue>>();

    // For each feature get the values for each image, and sort them based off the value
    for (Feature feature : allFeatures) {
        List<FeatureValue> thisFeaturesValues = new LinkedList<FeatureValue>();

        int index = 0;
        for (Image positive : positiveSamples) {
            int value = positive.applyFeature(feature);
            thisFeaturesValues.add(new FeatureValue(index, value, true));
            ++index;
        }
        index = 0;
        for (Image negative : negativeSamples) {
            int value = negative.applyFeature(feature);
            thisFeaturesValues.add(new FeatureValue(index, value, false));
            ++index;
        }

        Collections.sort(thisFeaturesValues);

        // Add this feature to the list
        featureValues.add(thisFeaturesValues);
        ++currentFeature;
    }

    ... rest of code

Solution

This should be the pseudocode for the selection of one of the weak classifiers:

normalize the per-example weights  // one float per example

for feature j from 1 to 45,396:
  // Training a weak classifier based on feature j.
  - Extract the feature's response from each training image (1 float per example)
  // This threshold selection and error computation is where sorting the examples
  // by feature response comes in.
  - Choose a threshold to best separate the positive from negative examples
  - Record the threshold and weighted error for this weak classifier

choose the best feature j and threshold (lowest error)

update the per-example weights

Nowhere do you need to store billions of features. Just extract the feature responses on the fly on each iteration. You're using integral images, so extraction is fast. That is the main memory bottleneck, and it's not that much, just one integer for every pixel in every image... basically the same amount of storage as your images required.

Even if you did just compute all the feature responses for all images and save them all so you don't have to do that every iteration, that still only:

45396 * 3000 * 4 bytes =~ 520 MB, or if you're convinced there are 160000 possible features,
160000 * 3000 * 4 bytes =~ 1.78 GB, or if you use 10000 training images,
160000 * 10000 * 4 bytes =~ 5.96 GB

Basically, you shouldn't be running out of memory even if you do store all the feature values.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow