OpenGL ES 2.0: Why does this perspective projection matrix not give the right result?

https://stackoverflow.com/questions/7785354

10-02-2021
|

Pergunta

About 2 days ago I decided to write code to explicitly calculate the Model-View-Projection ("MVP") matrix to understand how it worked. Since then I've had nothing but trouble, seemingly because of the projection matrix I'm using.

Working with an iPhone display, I create a screen centered square described by these 4 corner vertices:

        const CGFloat cy = screenHeight/2.0f;
        const CGFloat z = -1.0f;
        const CGFloat dim = 50.0f;

        vxData[0] = cx-dim;
        vxData[1] = cy-dim;
        vxData[2] = z;
        vxData[3] = cx-dim;
        vxData[4] = cy+dim;
        vxData[5] = z;
        vxData[6] = cx+dim;
        vxData[7] = cy+dim;
        vxData[8] = z;
        vxData[9] = cx+dim;
        vxData[10] = cy-dim;
        vxData[11] = z;

Since I am using OGLES 2.0 I pass the MVP as a uniform to my vertex shader, then simply apply the transformation to the current vertex position:

uniform mat4 mvp;
attribute vec3 vpos;
void main()
{
  gl_Position = mvp * vec4(vpos, 1.0);
}

For now I have simplified my MVP to just be the P matrix. There are two projection matrices listed in the code shown below. The first is the standard perspective projection matrix, and the second is an explicit-value projection matrix I found online.

CGRect screenBounds = [[UIScreen mainScreen] bounds];
const CGFloat screenWidth = screenBounds.size.width;
const CGFloat screenHeight = screenBounds.size.height;

const GLfloat n = 0.01f;
const GLfloat f = 100.0f;
const GLfloat fov = 60.0f * 2.0f * M_PI / 360.0f;
const GLfloat a = screenWidth/screenHeight;
const GLfloat d = 1.0f / tanf(fov/2.0f);


// Standard perspective projection.
GLKMatrix4 projectionMx = GLKMatrix4Make(d/a, 0.0f, 0.0f, 0.0f,
                                         0.0f, d, 0.0f, 0.0f,
                                         0.0f, 0.0f, (n+f)/(n-f), -1.0f,
                                         0.0f, 0.0f, (2*n*f)/(n-f), 0.0f);
// The one I found online.
GLKMatrix4 projectionMx = GLKMatrix4Make(2.0f/screenWidth,0.0f,0.0f,0.0f,
                                         0.0f,2.0f/-screenHeight,0.0f,0.0f,
                                         0.0f,0.0f,1.0f,0.0f,
                                         -1.0f,1.0f,0.0f,1.0f);

When using the explicit value matrix, the square renders exactly as desired in the centre of the screen with correct dimension. When using the perspective projection matrix, nothing is displayed on-screen. I've done printouts of the position values generated for screen centre (screenWidth/2, screenHeight/2, 0) by the perspective projection matrix and they're enormous. The explicit value matrix correctly produces zero.

I think the explicit value matrix is an orthographic projection matrix - is that right? My frustration is that I can't work out why my perspective projection matrix fails to work.

I'd be tremendously grateful if someone could help me with this problem. Many thanks.

UPDATE For Christian Rau:

 #define Zn 0.0f
 #define Zf 100.0f
 #define PRIMITIVE_Z 1.0f

 //...

 CGRect screenBounds = [[UIScreen mainScreen] bounds];
 const CGFloat screenWidth = screenBounds.size.width;
 const CGFloat screenHeight = screenBounds.size.height;

 //...

 glUseProgram(program);

 //...

 glViewport(0.0f, 0.0f, screenBounds.size.width, screenBounds.size.height);

 //...

 const CGFloat cx = screenWidth/2.0f;
 const CGFloat cy = screenHeight/2.0f;
 const CGFloat z = PRIMITIVE_Z;
 const CGFloat dim = 50.0f;

 vxData[0] = cx-dim;
 vxData[1] = cy-dim;
 vxData[2] = z;
 vxData[3] = cx-dim;
 vxData[4] = cy+dim;
 vxData[5] = z;
 vxData[6] = cx+dim;
 vxData[7] = cy+dim;
 vxData[8] = z;
 vxData[9] = cx+dim;
 vxData[10] = cy-dim;
 vxData[11] = z;

 //...

 const GLfloat n = Zn;
 const GLfloat f = Zf;
 const GLfloat fov = 60.0f * 2.0f * M_PI / 360.0f;
 const GLfloat a = screenWidth/screenHeight;
 const GLfloat d = 1.0f / tanf(fov/2.0f);
 GLKMatrix4 projectionMx = GLKMatrix4Make(d/a, 0.0f, 0.0f, 0.0f,
                                          0.0f, d, 0.0f, 0.0f,
                                          0.0f, 0.0f, (n+f)/(n-f), -1.0f,
                                          0.0f, 0.0f, (2*n*f)/(n-f), 0.0f);

 //...

 // ** Here is the matrix you recommended, Christian:
 GLKMatrix4 ts = GLKMatrix4Make(2.0f/screenWidth, 0.0f, 0.0f, -1.0f,
                                0.0f, 2.0f/screenHeight, 0.0f, -1.0f,
                                0.0f, 0.0f, 1.0f, 0.0f,
                                0.0f, 0.0f, 0.0f, 1.0f);

 GLKMatrix4 mvp = GLKMatrix4Multiply(projectionMx, ts);

UPDATE 2

The new MVP code:

GLKMatrix4 ts = GLKMatrix4Make(2.0f/screenWidth, 0.0f, 0.0f, -1.0f,
                               0.0f, 2.0f/-screenHeight, 0.0f, 1.0f,
                               0.0f, 0.0f, 1.0f, 0.0f,
                               0.0f, 0.0f, 0.0f, 1.0f);

// Using Apple perspective, view matrix generators
// (I can solve bugs in my own implementation later..!)
GLKMatrix4 _p = GLKMatrix4MakePerspective(60.0f * 2.0f * M_PI / 360.0f,
                                          screenWidth / screenHeight,
                                          Zn, Zf);
GLKMatrix4 _mv = GLKMatrix4MakeLookAt(0.0f, 0.0f, 1.0f,
                                      0.0f, 0.0f, -1.0f,
                                      0.0f, 1.0f, 0.0f);
GLKMatrix4 _mvp = GLKMatrix4Multiply(_p, _mv);
GLKMatrix4 mvp = GLKMatrix4Multiply(_mvp, ts);

Still nothing visible at the screen centre, and the transformed x,y coordinates of the screen centre are not zero.

UPDATE 3

Using the transpose of ts instead in the above code works! But the square no longer appears square; it appears to now have aspect ratio screenHeight/screenWidth i.e. it has a longer dimension parallel to the (short) screen width, and a shorter dimension parallel to the (long) screen height.

I'd very much like to know (a) why the transpose is required and whether it is a valid fix, (b) how to correctly rectify the non-square dimension, and (c) how this additional matrix transpose(ts) that we use fits into the transformation chain of Viewport * Projection * View * Model * Point .

For (c): I understand what the matrix does, i.e. the explanation by Christian Rau as to how we transform to range [-1, 1]. But is it correct to include this additional work as a separate transformation matrix, or should some part of our MVP chain be doing this work instead?

Sincere thanks go to Christian Rau for his valuable contribution thus far.

UPDATE 4

My question about "how ts fits in" is silly isn't it - the whole point is the matrix is only needed because I'm choosing to use screen coordinates for my vertices; if I were to use coordinates in world space from the start then this work wouldn't be needed!

Thanks Christian for all your help, it's been invaluable :) Problem solved.

Solução

The reason for this is, that your first projection matrix doesn't account for the scaling and translation part of the transformation, whereas the second matrix does it.

So, since your modelview matrix is identity, the first projection matrix assumes the models' coordinates to ly somewhere in [-1,1], whereas the second matrix already contains the scaling and translation part (look at the screenWidth/Height values in there) and therefore assumes the coordinates to ly in [0,screenWidth] x [0,screenHeight].

So you have to right-multiply your projection matrix by a matrix that first scales [0,screenWidth] down to [0,2] and [0,screenHeight] down to [0,2] and then translates [0,2] into [-1,1] (using w for screenWidth and h for screenHeight):

[ 2/w   0     0   -1 ]
[ 0     2/h   0   -1 ]
[ 0     0     1    0 ]
[ 0     0     0    1 ]

which will result in the matrix

[ 2*d/h   0       0             -d/a        ]
[ 0       2*d/h   0             -d          ]
[ 0       0       (n+f)/(n-f)   2*n*f/(n-f) ]
[ 0       0       -1            0           ]

So you see that your second matrix corresponds to a fov of 90 degrees, an aspect ratio of 1:1 and a near-far range of [-1,1]. Additionally it also inverts the y-axis, so that the origin is in the upper-left, which results in the second row being negated:

[ 0   -2*d/h   0   d ]

But as an end comment, I suggest you to not configure the projection matrix to account for all this. Instead your projection matrix should look like the first one and you should let the modelview matrix manage any translation or scaling of your world. It is not by accident, that the transformation pipeline was seperated into modelview and projection matrix and you should keep this separation also when using shaders. You can of course still multiply both matrices together on the CPU and upload a single MVP matrix to the shader.

And in general you don't really use a screen-based coordinate system when working with a 3-dimensional world. You would only want to do this if you are drawing 2d graphics (like GUI elements or HUDs) and in this case you would use a more simple orthographic projection matrix, anyway, that is nothing more than the above mentioned scale-translate matrix without all the perspective complexity.

EDIT: To your 3rd update:

(a) The transpose is required because I guess your GLKMatrix4Make function accepts its parameters in column-major format and you put the matrix in row-wise.

(b) I made a little mistake. You should change the screenWidth in the ts matrix into screenHeight (or maybe the other way around, not sure). We actually need a uniform scale, because the aspect ratio is already taken care of by the projection matrix.

(c) It is not easy to classify this matrix into the usual MVP pipeline. This is because it is not really common. Let's look at the two common cases of rendering:

3D: When you have a 3-dimensional world it is not really common to define it's coordinates in screen-based units, because there is not et a mapping from 3d-scene to 2d-screen and using a coordinate system where units equal pixels just doesn't make sense. In this setup you most likely would classify it as part of the modelview matrix for transforming the world into another unit system. But in this case you would need real 3d transformations and not just such a half-baked 2d solution.
2D: When rendering a 2d-scene (like a GUI or a HUD or just some text), you sometimes really want a screen-based coordinate system. But in this case you most likely would use an orthographic projection (without any perspective). Such an orthographic matrix is actually nothing more than this ts matrix (with some additional scale-translate for z, based on the near-far range). So in this case the matrix belongs to, or actually is, the projection matrix. Just look at how the good old glOrtho function constructs its matrix and you'll see its nothing more than ts.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow