Question

First of all:
Windows XP SP3, 2GB RAM, Intel core 2 Duo 2.33 GHz, nVidia 9600GT 1GB RAM. OpenGL 3.3 fully updated.

Short description of what I am doing:
Ideally I need to put ONE single pixel in a GL texture (A) using glTexSubImage2D every frame.
Then, modify the texture inside a shader-FBO-quadfacingcamera setup and replace the original image with the resulting FBO.

Of course, I don't want a FBO Feedback Loop, so instead I put the modified version inside a temporary texture and do the update separately with glCopyTexSubImage2D.

The sequence is now:

1) Put one pixel in a GL texture (A) using glTexSubImage2D every frame (with width=height=1).
2) This modified version A is to be used/modified inside a shader-FBO-quad setup to be rendered into a different texture (B).
3) The resulting texture B is to be overwritten over A using glCopyTexSubImage2D.
4) Repeat...

By repeating this loop I want to achieve a slow fading effect by multiplying the color values in the shader by 0.99 every frame.

2 things are badly wrong:
1) with a fading factor of 0.99 repeated every frame, the fading stops at RGB 48,48,48. Thus, leaving a trail of greyish pixels not fully faded out.
2) the program runs at 100 FPS. Very bad. Because if I comment out the glCopyTexSubImage2D the program goes at 1000 FPS!!

I achieve 1000 FPS also by commenting out just glTexSubImage2D and leaving alone glCopyTexSubImage2D. This fact to clarify that glTexSubImage2D and glCopyTexSubImage2D are NOT the bottleneck by themselves (I tried to replace glCopyTexSubImage2D with a secondary FBO to do the copying, same results).

Observation: the bottleneck shows when both those commands are working!

Hard mode: no PBOs pls.

Link with source and exe:
http://www.mediafire.com/?ymu4v042a1aaha3
(CodeBlocks and SDL used)
FPS counts are written into stdout.txt

I ask for a workaround for the 2 things exposed up there.
Expected results: full fade-out effect to plain black at 800-1000 FPS.

Was it helpful?

Solution 2

Problem 2: splatting arbitrary pixels into a texture as fast as possible.
Since probably the absolute fastest way to dynamically upload data to the GPU from main memory consists in Vertex Arrays or VBOs, then the solution to problem 2 gets trivial:
1) create Vertex Array and Color Array
(or interleave coordinates and colors, performance/bandwidth may vary);
2) Z component =0. We want points to lie on the floor;
3) camera pointing downwards with orthographic projection
(being sure to match exactly the screen size with coordinate ranges);
4) render to texture with FBO using GL_POINTS w/ glPointSize=1 and GL_POINT_SMOOTH disabled.

Pretty standard. Now the program runs at 750 fps. Close enough. My dreams were all like "Hey mom look! I'm running glTexSubImage2D at 1000 fps!" and then meh.
Though glCopyTexSubImage2D is very fast. Would recommend.

Not sure if this is the best way to GPU-accelerate fadings but given the results one must note a strong concentration of Force with this one. Anyway the problem with the fading stopping half-way is fixed by setting a minimum constant decrement variable, so even if the exponential curve fails the fading will finish no matter what.

OTHER TIPS

To problem 1:

You are experiencing some precision (and quantization) issues here. I assume you are using some 8 Bit UNORM framebuffer format, so anything you write to it will be rounded the next discrete step out of 256 levels. Think about it: 48*0.99 = 47.52, which will end up as 48 again, so it will not get any darker that. Using some real floating point format would be a solution, but it is likely to greatly decrease overall performance...

The fade out operation you chose is simply not the best choice, it might be better to add some linear term to guarantee that you decrease the value by at least 1/255.

To problem 2: It is hard to say what the actual bottleneck here is. As you are not using PBOs, you are limited to synchronous texture updates.

However, why do you need to do that copy operation at all? The standard approach to this kind of things would be some texture/FBO/color buffer "ping-pong", where you just swap the "role" of the textures after each iteration. So you get the sequence:

  1. update A
  2. render into B (reading from A)
  3. update B
  4. render into A (reading from B)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top