Question

The OpenCV Haar cascade classifier seems to use 24x24 images of faces as its positive training data. I have two questions regarding this:

  • What are the consideration that go into selecting the training image size, besides the fact that larger training images require more processing?
  • For non-square images, some people have chosen to keep one dimension at 24px, and expand the other dimension as necessary (to, say 100-200px). Is this the correct strategy?
  • How does one go about deciding the size of the training images (this is a variant of question 1)
Was it helpful?

Solution

  • I honestly believe that there are far better parameters to be tweaked than the image size. Even so, it's a question of fine-to-coarse detection - at finer levels, you gain detail and at coarser levels, you gain structure. Also, there is a trade off: with 24x24 detection regions, there are about ~160,000 possible rectangular (haar-like) features, so increasing or decreasing also affects this number for both training/testing (this is why boosting is used to select a small subset of discriminative features).

  • As you said, this is because his target was different (i.e. a pen). I think it is sensible to introduce a priori aspect ratio information to the cascade training, otherwise you would be getting detections that have square bounding boxes for a pen detector and probably suffer in performance because the training stage is picking up a larger background region around the pen.

  • See my first answer. I think this is largely empirical. There are techniques for either feature scaling or building image pyramids (e.g. see this work) that also mitigate the usefulness of highly controlling the choice of training target image sizes too.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top