Motion detection on screenshots

https://stackoverflow.com/questions/11976736

26-06-2021
|

Question

I'd like to know if there is a fast algorithm that can detect parts that moved between two sequential screenshots. The algorithm should take two images and output a set of (rectangular) regions in one image and a vector that describes where the matching region is located in the other image.

I'd like to use that for a losless video compression algorithm that is streamlined for screen capturing. I think this makes the use case a little bit different from the usual applications of motion detection:

The images are screenshots. It's unlikely that there are any artifacts or image noise
If a part of the image moves, it moves pixel-wise. Moved parts usually differ in less than 2% of their pixels
Regions that move are often big and have a rectangular shape

Since the video compression pipeline also has other steps and should happen in real-time, the motion detection should be fast.

Is there anything helpful?

Solution 4

What I did

I implemented a simple technique that compensates for most movements and is simple to implement.

Each frame is divided into tiles of a constant size, say, 8×8 pixels. The codec manages a ringbuffer of a customizable number of tiles, for instance 2²⁰. Now, for each tile the codec encounters in the input stream, it checks whether it is already to be found in the ringbuffer. If it is, it just saves the tiles index, if not it stores the tile in the ringbuffer.

Whenever a part of the image is moved by a multiple of the blocksize from any image in the past, it is very likely that one finds the tiles in the cache. This way, one can compensate most motion. Since finding a tile in the ringbuffer is very fast, this is fast enough to run in real time.

OTHER TIPS

I've got a few thoughts, and a possibly-workable solution you can consider.

First, consider tracking the individual pixel deltas and transmitting/storing just those. A typical interactive session usually involves very small parts of the UI changing; moving or resizing windows tends to be less common (anecdotally) for long computer use sessions. This efficiently captures simple things like entered text, cursor movements and small UI updates without a lot of extra work.

You could also consider trying to hook the OS at a lower level, to get e.g. a display list of pixels, or even (optimally) a list of 'damage' rectangles. Mac OS X's Quartz compositor can give you this information, for example. This can help you quickly narrow down what to update, and in the ideal case, may give you an efficient representation of the screen in and of itself.

If you can query the OS's (window manager's) information about windows, you can store separate streams of data (pixel deltas) for every visible window, and then apply a simple display-list approach to 'render' them during playback. Then, it is trivial to identify moving windows since you can simply diff the display lists.

If you can query the OS's information about the cursor position, you can use the cursor movement to quickly estimate movement deltas, since cursor moves usually correlate well with object movement on screen (e.g. moving windows, icons, dragging objects, etc.). This allows you to avoid processing the image to determine movement deltas.

On to a possible solution (or a last resort in case you still can't identify the movement delta with the above): we can actually deal with the (very common) case of a single moving rectangle reasonably easily. Make a mask of all the pixels that change in the frame. Identify the largest connected component in the mask. If it approximates a rectangle, then you can assume it represents a moved region. Either the window moves exactly orthogonal (e.g. entirely in the x- or y- direction), in which case the total delta looks like a slightly bigger rectangle, or the window moves diagonally, in which case the total delta will have an 8-sided shape. Either way, you can estimate the motion vector, and verify this by diffing the regions. Note that this treatment deliberately ignores details that you will have to consider, e.g. pixels moving independently near the windows, or regions which don't appear to change (such as large blocks of solid colour in the window). A practical implementation would have to deal with all of the above.

Finally, I'd look into existing literature on real-time motion estimation. A lot of work has been done in optimizing motion estimation and compensation for e.g. video encoding, so you may be able to use that work as well if you find the methods above inadequate.

A common way of tracking motion across frames is: 1. Decide on points you want to track in image 1 2. Correlate points in the input image with points in the output image 3. Determine the transform that got them there

For step 1, there are a lot of "trackers" out there, some fairly standard that you'll find in OpenCV that look for "interesting" points (intersections of edges, local maxima, etc). The Kanade-Tomasi image tracker is one of these. However, for your use you might prefer to just create a regular grid of points.

For step 2, a common technique is to use a quadtree of reduced resolution... by this I mean to take your image, create a new image with 1/2 the width and height, and again, and again. Now you've got a very low-resolution image at the top you can search orders of magnitude faster and will give you a bounding box to look in the next higher rez image. In your case, optimizations might be to first look at the output to see if it's changed at all, and when you find a match for point x, y to also look next to it for point x+1 or y+1, etc.

For step 3, that's up to you... if you're talking about windows sliding around the screen there's going to be large patches that move together but are otherwise identical outside and inside the edges. If there's any animation, though, that could throw things off. And the mouse cursor itself is a small thing that's going to move around and reduce the algorithm effectiveness.

Opencv covers image manipulation indepth and has a great number of tutorials on the subject.

http://docs.opencv.org/doc/tutorials/tutorials.html

PDF has more tuts than the website. Google for... opencv tutorials pdf ...top link.

The main website. http://opencv.willowgarage.com/wiki/

Essentially there are mathematical functions that you can run on images that work this out for you. Convolutions and such like.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow