Question

In the hopes of developing an application that is able to detect specific hand positions (or hand symbols) in real time, my team and I stumbled upon haar classification a few months ago. We thought this would be the ideal tool for the job. We are however experiencing difficulties while trying to create our own classifiers (we are using OpenCV). They are not capturing the object of interest a large percentage of the time (see second question below).

I have a two questions on the subject:

  1. We have searched a number of resources (I have a million and one tabs on this open at the moment), but there seems to be no surefire way of training your own classifiers. What are some pointers that are invaluable to creating accurate classifiers, while allowing flexibility (accounting for gender differences, weight, rings on hands, etc.)?
    • We have tried using large number of positives (1000) and negatives (3000).
    • We have used various lighting conditions, hands from different individuals, and slightly different angles of the hand.
    • We have varied the number of stages of cascades
  2. I understand that haar classifier detection uses 'haar-like' feature detection using thresholds that are created by the training process. Since the training process creates the thresholds, my thoughts were that running the classifier over the positive images used in training would always detect the image that we used in the positive samples. I attempted to do this and found that is not the case, and only 5.8% of my positive images were found to contain the image. Am I wrong to assume that theoretically, ~100% of my positive images should be detected by the classifier that they trained? Or has the training process gone wrong?

Some sources that we found very helpful were:

and of course the opencv cascade training page.

I appreciate any help on the matter. Many thanks!

Was it helpful?

Solution

First a question. How much in-plane rotation is there in the hand gestures you are trying to detect? A cascade detector is not rotationally invariant. That means that if your hand gestures can be tilted to the left or to the right by more than about 10 degrees, you would not be able to detect them. The only solution there would be to rotate the image and try detecting again.

Now some pointers:

  • 1000 positive samples is not a large number. For detecting hand gestures, you probably need at least 10 times that if not more.
  • Check what the minimum and the maximum window size of your detector is. Make sure that matches the size of the hands in your images.
  • Try a bigger window size.
  • Try other features. OpenCV lets you use LBP and HOG, in addition to Haar. I would guess that HOG may be good for this problem, and it takes much less time to train.

Edit: opencv_traincascade, which replaces haartraining supports HOG features. Alternatively, there is a trainCascadeObjectDetector function in the Computer Vision System Toolbox for Matlab that does the same thing and gives you a nicer interface. LBP is slightly less accurate than Haar on some benchmarks, but it is much faster to train with and takes much less memory.

If you have a lot of variation in orientation you definitely need more data. You also need to understand the range of possible rotations. Can your signs be upside down? Can they be rotated by 90 degrees? If your range is 30 degrees, maybe you can try 3 rotations of the image, or train 3 different detectors for each sign.

Also, if you use Haar features, you may benefit from enabling the 45-degree features. I think they are off by default.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top