Question

I am trying to make an automator tool and am experimenting with a type of recording which takes screen shots and records user inputs. The idea would be for user to take a snapshot and and highlight a square on the snapshot of the "submit" button. During playback, the program would take a sceenshot of the open window, and find the coordinates of the button by searching for the snapshot. So I need an algorithm to search an image for an exact (or very close) image of the button. The algorithms I've found so far compare image likeness but cannot find it in a subimage, and algorithms for object recognition seem a bit over the top considering the "object" im trying to find will be a near perfect match. Any ideas?

Was it helpful?

Solution 2

What you need is an efficient feature extraction method. This will depend on what you're looking for, but let's assume you're looking for the Send button in this image:

Screenshot of a web form

One of the characteristic features of this button is that it includes a pair of parallel line segments at the top and bottom. The same applies to the two text input fields, but for the button, this offset is exactly 17 pixels.

This is what you get if you calculate the maximum pixel values of the source image together with itself shifted vertically by 17 pixels:

Result of 17-pixel vertical shift and maximum value calculation

The Send button now appears as a solid horizontal line. You can detect this quite easily by thresholding the image and looking for an unbroken sequence of black pixels. Just for reference, here's what I obtained after applying a 10px horizontal motion blur and thresholding at a grey level of 128:

enter image description here

This process will identify candidate positions quite quickly. You can then subject these locations to stronger techniques like 2D convolution and OCR without too much loss of performance.

OTHER TIPS

The following tools can help you with that:

  1. find a distinct feature in the button image

    for example can use edge color neighboring the button face color or derivation, shape or average color of square sub image (8x8 pixels ...)

  2. search the snapshot for this feature

    I would use average color for start so divide image to N x N pixel areas and compute their average color. If you find square with similar average color to your button average colors then you have probable location.

  3. after this you can brute force attack the near area if it has your button

    in this stage do not compare your colors directly (can be distorted by anti-aliasing and filters ...). Better way would be to compare derivations +/- some accuracy. You can make an coefficient of probable button presence:

    p(x,y)=count(matching pixels) / (button pixels)
    

    and if it is close enough to 1.0 then you found your button.

PS. in stage 3 you can use Grayscale images to simplify things

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top