Question

I render to a floating point texture in a FBO and need the average value of all pixels of that texture on the CPU. So I thought using mipmapping to calculate the average into the 1x1 mipmap is pretty convenient because I save CPU computation time and I only need to transfer 1 pixel to the CPU instad of lets say 1024x1024 pixels.

So I use this line:

glGetTexImage(GL_TEXTURE_2D, variableHighestMipMapLevel, GL_RGBA, GL_FLOAT, fPixel);

But despite the fact that i specifically request only the highest mipmap level, which is always 1x1 pixel in size, the time it takes for that line of code to complete depends on the size of the level 0 mipmap of the texture. Which makes no sense to me. In my tests, for example, this line takes around 12 times longer for a 1024x1024 base texture than for a 32x32 base texture.

The result in fPixel is correct and only contains the wanted pixel, but the time clearly tells that the whole texture set is transferred, which kills the main reason for me, because the transfer to the CPU is clearly the bottleneck for me.

I use Win7 and opengl and tested this on an ATI Radeon HD 4800 and a GeForce 8800 GTS.

Does anybody know anything about that problem or has a smart way to only transfer the one pixel of the highest mipmap to the CPU?

Was it helpful?

Solution

glGenerateMipmap( GL_TEXTURE_2D );
float *fPixel = new float[4];
Timer.resume();
glGetTexImage(GL_TEXTURE_2D, highestMipMapLevel, GL_RGBA, GL_FLOAT, fPixel);
Timer.stop();

Let this be a lesson to you: always provide complete information.

The reason it takes 12x longer is because you're measuring the time it takes to generate the mipmaps, not the time it takes to transfer the mipmap to the CPU. glGenerateMipmap, like most rendering commands, will not actually have finished by the time it returns. Indeed, odds are good that it won't have even started. This is good, because it allows OpenGL to run independently of the CPU. You issue a rendering command, and it completes sometime later.

However, the moment you start reading from that texture, OpenGL has to stall the CPU and wait until everything that will touch that texture has finished. Therefore, your timing is measuring the time it takes to perform all operations on the texture as well as the time to transfer the data back.

If you want a more accurate measurement, issue a glFinish before you start your timer.

More importantly, if you want to perform an asynchronous read of pixel data, you will need to do the read into a buffer object. This allows OpenGL to avoid the CPU stall, but it is only helpful if you have other work you could be doing in the meantime.

For example, if you're doing this to figure out the overall lighting for a scene for HDR tone mapping, you should be doing this for the previous frame's scene data, not the current one. Nobody will notice. So you render a scene, generate mipmaps, read into a buffer object, then render the next frame's scene, generate mipmaps, read into a different buffer object, then start reading from the previous scene's buffer.

That way, by the time you start reading the previous read's results, they will actually be there and no CPU stall will happen.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top