transforming projection matrices computed from trifocal tensor to estimate 3D points

Question

The trifocal tensor won't help you, because like the fundamental matrix, it only enables projective reconstruction of the scene and camera poses. If X0_j and P0_i are the true 3D points and camera matrices, this means that the reconstructed points Xp_j = inv(H).X0_j and camera matrices Pp_i = P0_i.H are only defined up to a common 4x4 matrix H, which is unknown.

In order to obtain a metric reconstruction, you need to know the calibration matrices of your cameras. Whether you know these matrices (e.g. if you use virtual cameras for image rendering) or you estimated them using camera calibration (see OpenCV calibration tutorials), you can find a method to obtain a metric reconstruction in §7.4.5 of "Geometry, constraints and computation of the trifocal tensor", by C.Ressl (PDF).

Note that even when using this method, you cannot obtain an up-to-scale 3D reconstruction, unless you have some additional knowledge (such as knowledge of the actual distance between two fixed 3D points).

Sketch of the algorithm:

Inputs: the three camera matrices P1, P2, P3 (projective world coordinates, with the coordinate system chosen so that P1=[I|0]), the associated calibration matrices K1, K2, K3 and one point correspondence x1, x2, x3.

Outputs: the three camera matrices P1_E, P2_E, P3_E (metric reconstruction).

Set P1_E=K1.[I|0]
Compute the fundamental matrices F21, F31. Denoting P2=[A|a] and P3=[B|b], you have F21=[a]x.A and F31=[b]x.B (see table 9.1 in [HZ00]), where for a 3x1 vector e [e]x = [0,-e_3,e_2;e_3,0,-e_1;-e_2,e_1,0]
Compute the essential matrices E21 = K2'.F21.K1 and E31 = K3'.F31.K1
For i = 2,3, do the following

i. Compute the SVD Ei1=U.S.V'. If det(U)<0 set U=-U. If det(V)<0 set V=-V.

ii. Define W=[0,-1,0;1,0,0;0,0,1], Ri=U.W.V' and ti = third column of U

iii. Define M=[Ri'.ti]x, X1=M.inv(K1).x1 and Xi=M.Ri'.inv(Ki).xi

iv. If X1_3.Xi_3<0, set Ri=U.W'.V' and recompute M and X1

v. If X1_3<0 set ti = -ti

vi. Define Pi_E=Ki.[Ri|ti]
Do the following to retrieve the correct scale for t3 (consistantly to the fact that ||t2||=1):

i. Define p2=R2'.inv(K2).x2 and p3=R3'.inv(K3).x3

ii. Define M=[p2]x

iii. Compute the scale s=(p3'.M.R2'.t2)/(p3'.M.R3'.t3)

iv. Set t3=t3*s
End of the algorithm: the camera matrices P1_E, P2_E, P3_E are valid up to an isotropic scaling of the scene and a change of 3D coordinate system (hence it is a metric reconstruction).

[HZ00] "Multiple view geometry in computer vision" , by R.Hartley and A.Zisserman, 2000.