Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known

https://stackoverflow.com/questions/7836134

27-10-2019
|

Question

I am doing camera calibration from tsai algo. I got intrensic and extrinsic matrix, but how can I reconstruct the 3D coordinates from that inormation?

enter image description here

1) I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.

2) I can use the OpenCV documentation approach:

enter image description here

as I know u, v, R , t , I can compute X,Y,Z.

However both methods end up in different results that are not correct.

What am I'm doing wrong?

Solution

If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as

                   H = K*[r1, r2, t],       //eqn 8.1, Hartley and Zisserman

with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.

Then normalize dividing everything by t3.

What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.

Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:

        p          = [x y 1];
        projection = H * p;                   //project
        projnorm   = projection / p(z);      //normalize

Hope this helps.

OTHER TIPS

As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.

p          = [x y 1];
projection = H * p;                   //project
projnorm   = projection / p(z);      //normalize

One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow