Question

I am encountering some problems using OpenCV on Android without NDK.
Currently I am doing a project a my university and my chefs tells me that I should avoid camera calibration when reconstructing 3D objects from 2D images.

So far I have 2 2D images and have all the featurepoints, matches, good_matches, fundamental matrix and the homogenous matrix. In addition I have calculated the disparity map using StereoBM. The next step should be getting a 3D Point cloud from all those values.

I checked the internet and found

Calib3d.reprojectImageTo3D(disparity, _3dImage, Q, false);

Using this method, I should be able to recreate the 3D Point Cloud ... current problem is, that I do not have the Matrix Q. I think I will get this from the method

stereoRectify(...);

But as I should avoid cameraCalibration for this specific case, I cannot use this method. The alternative

stereoRectifyUncalibrated(...);

Does not provide Q...

Can someone please help me and show me how I can get Q or the point cloud in an easier way? Thanks

Was it helpful?

Solution

To answer your question, the Q matrix required by reprojectImageTo3D represents the mapping from a pixel position and associated disparity (i.e. of the form [u; v; disp; 1]) to the corresponding 3D point [X; Y; Z; 1]. Unfortunately, you cannot derive this relation without knowing the cameras' intrinsics (matrix K) and extrinsics (rotation & translation between the two camera poses).

Camera calibration is the usual way to estimate those. Your chef said that it is not an option, however there are several different techniques (e.g. using a chessboard, or via autocalibration) with various requirements and possibilities. Hence, investigating exactly why calibration is off the table may enable finding a method appropriate to your application.

If you really have no way of estimating the intrinsics, a possible solution could be Bundle Adjustment, using more than just 2 images. However, without the intrinsics, the 3D reconstruction will likely not be very useful. Which leads us to my second point.

There are several types of 3D reconstruction, the main types being: projective, metric and Euclidian. (For more details on this, see §10.2 p 264 in "Multiple View Geometry in Computer Vision" by Hartley & Zisserman, 2nd Edition)

  • The Euclidian reconstruction is what most people mean by "3D reconstruction", though not necessarily what they need: a model of the scene which relates to the true model only by a 3D rotation and a 3D translation (i.e. a change of the 3D coordinate system). Hence, orthogonal angles in the scene are orthogonal in such a model and a distance of 1 meter in the scene corresponds to 1 meter in the model. In order to obtain such a Euclidian 3D reconstruction, you need to know the intrinsics of at least some cameras AND the true distance between two given points in the scene.

  • The metric or similarity reconstruction is most of the time good enough and refers to a 3D model of the scene which relates to the true model by a similarity transform, in other words by a 3D rotation and a 3D translation (i.e. a change of the 3D coordinate system) and also by an overall scaling. In order to obtain such an metric reconstruction, you need to know the intrinsics of at least some cameras.

  • The projective reconstruction is what you will obtain if you have no knowledge about the scene or camera's intrinsics. Such a 3D model is not up-to-scale with respect to the observed scene, and angles which are orthogonal in the scene will probably not be orthogonal in the model.

Hence, knowing the intrinsics parameters of (some of) the cameras is crucial if you want an accurate reconstruction.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top