To remap 2D display coordinates (display_x, display_y)
to 3D object coordinates (x,y,z)
you need to know
- the depth
display_z
of the pixel at(display_x, display_y)
- the transformation
T
that transforms clip space coordinates(clip_x, clip_y, clip_z)
to display coordinates - the transformation
M
that transforms object coordinates to clip space coordinates (usually combines a camera and a perspective)
The display coordinates are computed as follows
M.transform(x, y, z, 1) --> (clip_x, clip_y, clip_z, clip_w)
T.transform(clip_x / clip_w, clip_y / clip_w, clip_z / clip_w) --> (display_x, display_y, display_z)
M.transform
is an invertible matrix multiplication and T.transform
is any invertible transformation.
You can recover (x,y,z)
from (display_x, display_y, display_z)
as follows
T.inverse_transform(display_x, display_y, display_z) --> (a, b, c)
M.inverse_transform(a, b, c, 1) --> (X, Y, Z, W)
(X/W, Y/W, Z/W) --> (x, y, z)
The following gives intuition on why the above computation leads to the right solution
T.inverse_transform(display_x, display_y, display_z) --> (clip_x / clip_w, clip_y / clip_w, clip_z / clip_w)
(clip_x / clip_w, clip_y / clip_w, clip_z / clip_w, clip_w / clip_w) == (clip_x, clip_y, clip_z, clip_w) / clip_w
M.inverse_transform((clip_x, clip_y, clip_z, clip_w) / clip_w) == M.inverse_transform(clip_x, clip_y, clip_z, clip_w) / clip_w
M.inverse_transform(clip_x, clip_y, clip_z, clip_w) / clip_w --> (x, y, z, 1) / clip_w
(x, y, z, 1) / clip_w == (x / clip_w, y / clip_w, z / clip_w, 1 / clip_w)
(x / clip_w, y / clip_w, z / clip_w, 1 / clip_w) == (X, Y, Z, W)
The above used the following matrix (M
) vector (v
) scalar (a == 1 / clip_w
) property:
M * (a * v) == a * (M * v)