Shape/Pattern Matching Approach in Computer Vision

https://stackoverflow.com/questions/8681980

06-05-2021
|

题

I am currently facing a, in my opinion, rather common problem which should be quite easy to solve but so far all my approached have failed so I am turning to you for help.

I think the problem is explained best with some illustrations. I have some Patterns like these two:

Pattern 1 Pattern 3

I also have an Image like (probably better, because the photo this one originated from was quite poorly lit) this:

(Note how the Template was scaled to kinda fit the size of the image)

The ultimate goal is a tool which determines whether the user shows a thumb up/thumbs down gesture and also some angles in between. So I want to match the patterns against the image and see which one resembles the picture the most (or to be more precise, the angle the hand is showing). I know the direction in which the thumb is showing in the pattern, so if i find the pattern which looks identical I also have the angle.

I am working with OpenCV (with Python Bindings) and already tried cvMatchTemplate and MatchShapes but so far its not really working reliably.

I can only guess why MatchTemplate failed but I think that a smaller pattern with a smaller white are fits fully into the white area of a picture thus creating the best matching factor although its obvious that they dont really look the same.

Are there some Methods hidden in OpenCV I havent found yet or is there a known algorithm for those kinds of problem I should reimplement?

Happy New Year.

解决方案

A few simple techniques could work:

After binarization and segmentation, find Feret's diameter of the blob (a.k.a. the farthest distance between points, or the major axis).
Find the convex hull of the point set, flood fill it, and treat it as a connected region. Subtract the original image with the thumb. The difference will be the area between the thumb and fist, and the position of that area relative to the center of mass should give you an indication of rotation.
Use a watershed algorithm on the distances of each point to the blob edge. This can help identify the connected thin region (the thumb).
Fit the largest circle (or largest inscribed polygon) within the blob. Dilate this circle or polygon until some fraction of its edge overlaps the background. Subtract this dilated figure from the original image; only the thumb will remain.
If the size of the hand is consistent (or relatively consistent), then you could also perform N morphological erode operations until the thumb disappears, then N dilate operations to grow the fist back to its original approximate size. Subtract this fist-only blob from the original blob to get the thumb blob. Then uses the thumb blob direction (Feret's diameter) and/or center of mass relative to the fist blob center of mass to determine direction.

Techniques to find critical points (regions of strong direction change) are trickier. At the simplest, you might also use corner detectors and then check the distance from one corner to another to identify the place when the inner edge of the thumb meets the fist.

For more complex methods, look into papers about shape decomposition by authors such as Kimia, Siddiqi, and Xiaofing Mi.

其他提示

MatchTemplate seems like a good fit for the problem you describe. In what way is it failing for you? If you are actually masking the thumbs-up/thumbs-down/thumbs-in-between signs as nicely as you show in your sample image then you have already done the most difficult part.

MatchTemplate does not include rotation and scaling in the search space, so you should generate more templates from your reference image at all rotations you'd like to detect, and you should scale your templates to match the general size of the found thumbs up/thumbs down signs.

[edit] The result array for MatchTemplate contains an integer value that specifies how well the fit of template in image is at that location. If you use CV_TM_SQDIFF then the lowest value in the result array is the location of best fit, if you use CV_TM_CCORR or CV_TM_CCOEFF then it is the highest value. If your scaled and rotated template images all have the same number of white pixels then you can compare the value of best fit you find for all different template images, and the template image that has the best fit overall is the one you want to select.

There are tons of rotation/scaling independent detection functions that could conceivably help you, but normalizing your problem to work with MatchTemplate is by far the easiest.

For the more advanced stuff, check out SIFT, Haar feature based classifiers, or one of the others available in OpenCV

I think you can get excellent results if you just compute the two points that have the furthest shortest path going through white. The direction in which the thumb is pointing is just the direction of the line that joins the two points.

You can do this easily by sampling points on the white area and using Floyd-Warshall.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow