You did not say how you are creating the perspective projection matrix, which is key information. I am not a D3D expert. Rather I know OpenGL.
So I can show you how to do this with the matrix that OpenGL gluPerspective
produces. It has the form:
This matrix is for OpenGL column vectors. Your D3D code uses a row vector, so I'll assume the transpose. For convenience rename the non-zero terms and take the inverse:
Inverting symbolically with Wolfram Alpha (a nice tool for this sort of thing):
{{a,0,0,0},{0,b,0,0},{0,0,c,-1},{0,0,d,0}}^-1
We get:
So if the point being multiplied is [x y z 1]
as in your code, then the result you want is
[x/a y/b -1 (z+c)/d]
This is a 4d homogenous point. To get back to 3d, divide through x,y,z by w:
[(1/a)ux (1/b)uy -u] where u = d / (z+c)
The (1/a) and (1/b) terms can of course be computed in advance. After that, you need one addition, one division, and four multiplications to finish the job.
This explanation could be off modulo some negative signs or other minor details due to differences between D3D and OpenGL coordinate systems that I'm unaware of.
Having said all this, I agree with the commenters who think this is not going to produce a meaningful speedup.