How to track motion between images in a stream

Question

If you just want a solution that works, ZoneMinder or Motion are two pieces of software that run under linux using the video4linux interface.

If you need to roll your own for some reason there are a lot of techniques or strategies you can use. You are largely on the right track with what you've outlined. You're missing a few important details though.

Since the camera is stationary, keep a record of the last N frames as your "background" image. Average them all.

http://opencv.willowgarage.com/documentation/cpp/imgproc_motion_analysis_and_object_tracking.html
Subtract the background from the current image. What you're left with we'll call the foreground.

http://opencv.willowgarage.com/documentation/cpp/core_operations_on_arrays.html#cv-absdiff
Optionally perform dilation or erosion (or both) to remove noise or join nearly connection regions.

http://opencv.willowgarage.com/documentation/image_filtering.html#dilate
Threshold the foreground image to determine what's important and what's not.

http://docs.opencv.org/doc/tutorials/imgproc/threshold/threshold.html
Optionally use the findContours function to get a description of what's "moved"

http://docs.opencv.org/doc/tutorials/imgproc/shapedescriptors/find_contours/find_contours.html

Once you have the contours you can also find the bounding rectangles if that's more what you're going for.

http://opencv.willowgarage.com/documentation/python/structural_analysis_and_shape_descriptors.html#boundingrect

This will not be perfect and when debugging or optimizing you have to show output after every step to figure out what's working right and what isn't. Spend some time building the infrastructure to make that easier. Once you have source data and most of a working pipeline tuning to get the results you want is quite doable.