How to map 2D display coordinate to 3D OpenGL space

Question

To remap 2D display coordinates (display_x, display_y) to 3D object coordinates (x,y,z) you need to know

the depth display_z of the pixel at (display_x, display_y)
the transformation T that transforms clip space coordinates (clip_x, clip_y, clip_z) to display coordinates
the transformation M that transforms object coordinates to clip space coordinates (usually combines a camera and a perspective)

The display coordinates are computed as follows

M.transform(x, y, z, 1) --> (clip_x, clip_y, clip_z, clip_w)

T.transform(clip_x / clip_w, clip_y / clip_w, clip_z / clip_w) --> (display_x, display_y, display_z)

M.transform is an invertible matrix multiplication and T.transform is any invertible transformation.

You can recover (x,y,z) from (display_x, display_y, display_z) as follows

T.inverse_transform(display_x, display_y, display_z) --> (a, b, c)

M.inverse_transform(a, b, c, 1) --> (X, Y, Z, W)

(X/W, Y/W, Z/W) --> (x, y, z)

The following gives intuition on why the above computation leads to the right solution

T.inverse_transform(display_x, display_y, display_z) --> (clip_x / clip_w, clip_y / clip_w, clip_z / clip_w)

(clip_x / clip_w, clip_y / clip_w, clip_z / clip_w, clip_w / clip_w) == (clip_x, clip_y, clip_z, clip_w) / clip_w

M.inverse_transform((clip_x, clip_y, clip_z, clip_w) / clip_w) == M.inverse_transform(clip_x, clip_y, clip_z, clip_w) / clip_w

M.inverse_transform(clip_x, clip_y, clip_z, clip_w) / clip_w --> (x, y, z, 1) / clip_w

(x, y, z, 1) / clip_w == (x / clip_w, y / clip_w, z / clip_w, 1 / clip_w)

(x / clip_w, y / clip_w, z / clip_w, 1 / clip_w) == (X, Y, Z, W)

The above used the following matrix (M) vector (v) scalar (a == 1 / clip_w) property:

M * (a * v) == a * (M * v)

How to map 2D display coordinate to 3D OpenGL space

EDIT