SolvePnP - How to use it?

https://stackoverflow.com/questions/23493701

16-07-2023
|

Question

I am doing some multiview geometry reconstruction with structure from motion. So far I am having having the following

Two images as initial input Camera parameters and distortion coeff The working rectification pipeline for the initial input images Creation of a disparity map Creating a pointCloud from the disparity map with iterating over the disparity map and taking the value as z (x and y are the pixel coordinates of the pixel in the disparity map) (What is not working is reprojectImageTo3D as my Q matrix seems to be very wrong, but everything else is working perfectly) This gives me a good pointcloud of the scene.

Now I need to add n more images to the pipeline. I've googled a lot and found the method solvePnP will help me.

But now I am very confused...

SolvePnP will take a list of the 3D points and the corresponding 2D image points and reconstruct the R and T vector for the third, fourth camera... and so on. I've read that the tho vectors need to be aligned, meaning that the first 3D point in the first vector corresponds to the first 2D point in the 2nd vector.

So far so good. But from where do I take those correspondances? Can I use this method reprojectPoints for getting those two vectors??? Or is my whole idea wrong using the disparity map for depth reconstruction? (Alternative: triangulatePoints using the good matches found before).

Can someone help me getting this straight? How can I use solvePnP to add n more cameras and therefore 3D Points to my pointcloud and improve the result of the reconstruction?

Solution

Solve PnP is a function, which given the 3D model of an object (let's say, a chessboard) and a view of this object in the real world, will give you an approximate position and orientation of the camera relative to the object.

The 3D model and view of the object are sets of corresponding 3D and 2D points. The function will work better when you know the object model (positions of key points of the object) and positions of those key points on the image from camera.

I'm not an expert in 3D image reconstruction, but it would seem that using the information from each following image containing observed scene and its key points, and finding those on images from views you should be able to iteratively improve your model and also improve the approximation of the camera's position.

As you have a disparity map, which shows the distance of key points of the scene viewed from two different points, it could be indeed better to use triangulation, if you know the exact points of view. Or their good approximations (and then you would need to improve those approximations with subsequent new views).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow