Stereoscopic camera for depth measurement

Question

Camera sensors either have to lie on the same plane or their images has to be rectified so that 'virtually' they lie in the same plane. This is the only requirement and it simplifies the search for matches between the left and right image: whatever you have in the left image will be located in the right at the same row so you don't need to check other rows. You can skip this requirement but then your search will be more extensive. When you done with finding correspondences you can figure out the depth from them.

In rectified camera, the depth is determined from the shift: for example if the left image has a feature in row 4, column 11 and the left image has this feature in row 4 (same row since camera was rectified) column 1 then we say that disparity is 11-1=10. The disparity D is inversely proportional to dept Z:

Z=fB/D , where B is distance between cameras. At the end you will have depth estimates everywhere where you found correspondences. So called dense stereo aims to get more than 90% of image area covered where sparse stereo recovers only a few depth measurements.

Note that it is hard to find correspondences if there is a little texture on the surface of the object or in other words it is uniformly colored. Some cameras such as Kinect project their own pattern on the objects to solve the problem of feature absence.