Question

i am currently writing a small game engine using OpenGL. The mesh data is uploaded to vbos using GL_STATIC_DRAW. Since I read that glBindBuffer is rather slow, I tried to minimize its use by accumulating the informations needed for rendering and then rendering a vbo multiple times using only one glBindBuffer per vbo (sort of like batch-rendering?). Here is the code I use for the actual rendering:

    int lastID = -1;
    for(list<R_job>::iterator it = jobs.begin(); it != jobs.end(); ++it) {
        if(lastID == -1) {
            lastID = *it->vboID;
            glBindTexture(GL_TEXTURE_2D, it->texID);
            glBindBuffer(GL_ARRAY_BUFFER, *it->vboID);
            glVertexPointer(3, GL_FLOAT, 4*(3+2+3), 0);
            glTexCoordPointer(2, GL_FLOAT, 4*(3+2+3), (void*)(4*3));
            glNormalPointer(GL_FLOAT, 4*(3+2+3), (void*)(4*(3+2)));
        }

        if(lastID != *it->vboID) {
            glBindTexture(GL_TEXTURE_2D, it->texID);
            glBindBuffer(GL_ARRAY_BUFFER, *it->vboID);
            glVertexPointer(3, GL_FLOAT, 4*(3+2+3), 0);
            glTexCoordPointer(2, GL_FLOAT, 4*(3+2+3), (void*)(4*3));
            glNormalPointer(GL_FLOAT, 4*(3+2+3), (void*)(4*(3+2)));
            lastID = *it->vboID;
        }

        glPushMatrix();
        glMultMatrixf(value_ptr(*it->mat)); //the model matrix
        glDrawArrays(GL_TRIANGLES, 0, it->size); //render
        glPopMatrix();
    }

The list is sorted by the id of the vbos. The data is interleaved. My question is about speed. This code can render about 800 vbos (being the same, only drawArrays is called multiple times) at 30 fps on my 2010 macbook. On my PC (Phenom II X4 955 / HD 5700) at only 400 calls the fps go lower than 30. Could someone explain this to me? I hoped for a speedup on my pc. I am also using GLFW, GLEW, Xcode and VS2012 for each machine.

EDIT: The mesh I am rendering has about 600 verts.

Was it helpful?

Solution

I'm sure you were running in Release mode, but you might also consider running in Release mode without the debugger attached. I think you will find that doing so will solve your performance issues with list::sort. In my experience the VS debugger can make a significant performance impact when it's attached - far more so than gdb.

2000 entities is a reasonable place to start seeing some FPS drop. At that point you are making nearly 10,000 API calls per frame. To comfortably get higher than that, you will need to start doing something like instancing, so that you are drawing multiple entities with one call.

Finally, I would like to say that glBindBuffer is not an expensive operation, and not really something that you should be batching to avoid. If you are going to batch, batch to avoid changing shaders and shader state (uniforms). But don't batch just to avoid changing buffer objects.

OTHER TIPS

So I guess I found the answer. Enabling VSync in AMD Control Center increased the fps greatly (probably because of GLFWs' swapBuffers), though they were still below the values of my macbook. After some debugging I found that list::sort is somehow VERY slow on windows and was pulling my fps down...

So right now(without list::sort), the fps drop(<60) at around 2000 Entities: 600*2000 = 1 200 000

Should that be my graphics cards limit?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top