Question

My 9600GT hates me.

Fragment shader:

#version 130

uint aa[33] = uint[33](
    0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,
    0,0,0,0,0,0,0,0,0,0,
    0,0,0
);

void main() {
    int i=0;
    int a=26;

    for (i=0; i<a; i++) aa[i]=aa[i+1];

    gl_FragColor=vec4(1.0,0.0,0.0,1.0);

}

If a=25 program runs at 3000 fps.
If a=26 program runs at 20 fps.
If size of aa <=32 issue doesn't appear.
Viewport size is 1000x1000.
Problem occurs only when the size of aa is >32.
Value of a as the threshold varies with the calls to the array inside the loop (aa[i]=aa[i+1]+aa[i-1] gives a different deadline).
I know gl_FragColor is deprecated. But that's not the issue.

My guess is that GLSL doesn't unroll automatically the loop if a>25 and size(aa)>32. Why. The reason why it depends on the size of the array is unknown to mankind.

A quite similar behavior explained here:
http://www.gamedev.net/topic/519511-glsl-for-loops/

Unwinding the loop manually does solve the issue (3000 fps), even if aa size is >32:

    aa[0]=aa[1];
    aa[1]=aa[2];
    aa[2]=aa[3];
    aa[3]=aa[4];
    aa[4]=aa[5];
    aa[5]=aa[6];
    aa[6]=aa[7];
    aa[7]=aa[8];
    aa[8]=aa[9];
    aa[9]=aa[10];
    aa[10]=aa[11];
    aa[11]=aa[12];
    aa[12]=aa[13];
    aa[13]=aa[14];
    aa[14]=aa[15];
    aa[15]=aa[16];
    aa[16]=aa[17];
    aa[17]=aa[18];
    aa[18]=aa[19];
    aa[19]=aa[20];
    aa[20]=aa[21];
    aa[21]=aa[22];
    aa[22]=aa[23];
    aa[23]=aa[24];
    aa[24]=aa[25];
    aa[25]=aa[26];
    aa[26]=aa[27];
    aa[27]=aa[28];
    aa[28]=aa[29];
    aa[29]=aa[30];
    aa[30]=aa[31];
    aa[31]=aa[32];
    aa[32]=aa[33];
Was it helpful?

Solution

I am just putting in a summarizing answer of the comments here so this does not show up as unanswered anymore.

"#pragma optionNV (unroll all)"

fixes the immediate issue on nvidia.

In general though, GLSL compilers are very implementation dependent. The reason why there is a drop of at exactly 32 is easily explained by hitting a compiler heuristic like "don't unroll loops longer than 32". Also the huge speed difference might come from an unrolled loop using constants while a dynamic loop will require addressable array memory. Another reason could be that when unrolling dead code elimination an constant folding kicks in reducing the entire loop to nothing.

The most portable way to fix this is really manual unrolling, or even better manual constant folding. It is always questionable to compute constants in a fragment shader that can be computed outside. Some drivers might catch it for some cases, but it is better not to rely on that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top