Question

I have a pixel shader, written in HLSL, that declares the following constant buffer:

cbuffer RenderParametersData : register(b2) 
{
    float4 LineColor[16];
};

In one of the shader functions, I look up the output color based on the index "color" (which is not really a color, just a convenient place to put the index into the array of LineColors):

output.Color = Colors[input.Color.b * 255];

This results in a dramatic increase in the number of instruction slots in the resulting assembly code. Keeping everything else constant, but instead performing a constant array lookup - output.Color = LineColor[0]; - the number of arithmetic operations goes from 10 to 37. Almost all of the additional operations look like this:

cmp r2, -r1.x, c0, r0.w
cmp r2, -r1.y, c1, r2
cmp r2, -r1.z, c2, r2
cmp r1, -r1.w, c3, r2

Where c increases to 15, matching the number of elements in LineColor. Resizing LineColor to 8 elements resulted in code much like the second case, but with c going only to 7, again matching the number of elements in the array. Going back to constant lookup, the number of operations dropped back to 10.

So it seems that dynamic constant buffer array lookup carries a pretty significant additional cost, adding one comparison instruction per element in the array, plus some overhead. I am genuinely surprised at how expensive this array lookup is, and given that my array size will soon increase by an order of magnitude, this will push me over the 64 arithmetic instructions limit.

Is this the expected behavior? Am I doing something wrong here, or is this a necessary consequence of dynamic array indexing?

Thanks!

EDIT: Just to add some additional detail, the effect I'm after is to color some quads based on data from the vertex shader and texture coordinates. I would do the work in the vertex shader, but interpolation of the texture coordinates has to occur first.

EDIT2: I've resolved this. I was specifying to FXC that my target is ps_4_0_level_9_1, which results in it generating assembly for both shader model 2.0 and 4.0. I discovered that the additional comparison per element problem only occurs in the model 2.0 assembly code. Switching the compiler targer to PS_4_0 results in getting only the model 4.0 code, and since I'm not constrained to level 9_1, things are now working well.

Was it helpful?

Solution

I resolved this by specifying that shader model 2.0 assembly should not be generated by the compiler. More details at the end of the question.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top