Question

I have the problem that the below pixel shader (HLSL) compiles to 68 instructions (with the below suggested optimizations). However, I would like to use it with shader model 2 and therefore unfortunately I can only use up to 64 instructions. Does anyone see any possible optimizations without changing the result of the shader?

The shader transforms a more-or-less spherical region of the screen (with sinus-shaped borders) from RGB to a gradient of white -> red -> black with some additional brightness etc. modifications.

The shader code is:

// Normalized timefactor (1 = fully enabled)
float timeFactor;

// Center of "light"
float x;
float y;

// Size of "light"
float viewsizeQ;
float fadesizeQ;

// Rotational shift
float angleShift;

// Resolution
float screenResolutionWidth;
float screenResolutionHeight;
float screenZoomQTimesX;

// Texture sampler
sampler TextureSampler : register(s0);

float4 method(float2 texCoord : TEXCOORD0) : COLOR0
{
// New color after transformation
float4 newColor;

// Look up the texture color.
float4 color = tex2D(TextureSampler, texCoord);

// Calculate distance
float2 delta = (float2(x, y) - texCoord.xy)
             * float2(screenResolutionWidth, screenResolutionHeight);

// Get angle from center
float distQ = dot(delta, delta) - sin((atan2(delta.x, delta.y) + angleShift) * 13) * screenZoomQTimesX;

// Within fadeSize
if (distQ < fadesizeQ)
{
   // Make greyscale
   float grey = dot(color.rgb, float3(0.3, 0.59, 0.11));

   // Increase contrast by applying a color transformation based on a quasi-sigmoid gamma curve
   grey = 1 / (1 + pow(1.25-grey/2, 16) );

   // Transform Black/White color range to Black/Red/White color range
   // 1 -> 0.5f ... White -> Red
   if (grey >= 0.75)
   {
   newColor.r = 0.7 + 0.3 * color.r;
   grey = (grey - 0.75) * 4;
   newColor.gb = 0.7 * grey + 0.3 * color.gb;
   }
   else // 0.5f -> 0 ... Red -> Black
   {
   newColor.r = 1.5 * 0.7 * grey + 0.3 * color.r;
   newColor.gb = 0.3 * color.gb ;
   }

   // Within viewSize (Full transformation, only blend with timefactor)
   if (distQ < viewsizeQ)
   {
 color.rgb = lerp(newColor.rgb, color.rgb, timeFactor);
   }
   // Outside viewSize but still in fadeSize (Spatial fade-out but also with timefactor)
   else
   {
      float factor = timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ));
      color.rgb = lerp(newColor.rgb, color.rgb, factor);
   } 
}
Was it helpful?

Solution

Few bits and pieces also, you have x,y for light center + screen width /height.

Replacing by :

float2 light;
float2 screenResolution;

Then in your code:

float2 delta = (light - texCoord.xy) * screenResolution;

Should remove 2 more instructions.

Next is the use of atan2, which is likely to be the most hungry one.

You can declare another float2 (float2 vecshift), where x = cos(AngleShift) and y = sin(angleShift). Just precompute this one in CPU.

Then you can do the following (basically do a cross product to extract angle instead of using atan2):

float2 dn = normalize(delta);
float cr = dn.x *vecshift.y -dn.y * vecshift.x;
float distQ = dot(delta, delta) - sin((asin(cr))*13) *screenZoomQTimesX;

Please note than I'm not too keen on sin of asin of something, but polynomial form would not fit in your use case. I'm sure there's a much cleaner version to modulate than using sin*asin tho ))

Using ? construct instead of if/else can also (sometimes) help for your instruction count.

color.rgb = lerp(newColor.rgb, color.rgb, distQ < viewsizeQ ? timeFactor : timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ)));

Does reduce 2 more instructions.

Full version here, sets to 60 instructions.

// Normalized timefactor (1 = fully enabled)
float timeFactor;

float2 light;

float viewsizeQ;
float fadesizeQ;

float2 screenResolution;
float screenZoomQTimesX;

float2 vecshift;

// Texture sampler
sampler TextureSampler : register(s0);

float4 method(float2 texCoord : TEXCOORD0) : COLOR0
{
// New color after transformation
float4 newColor;

// Look up the texture color.
float4 color =tex2D(Samp, texCoord);

// Calculate distance
float2 delta = (light - texCoord.xy) * screenResolution;

float2 dn = normalize(delta);
float cr = dn.x *vecshift.y -dn.y * vecshift.x;

float distQ = dot(delta, delta) - sin((asin(cr))*13) *screenZoomQTimesX;
//float distQ = dot(delta, delta) - a13 *screenZoomQTimesX;

if (distQ < fadesizeQ)
{
   // Make greyscale
   float grey = dot(color.rgb, float3(0.3, 0.59, 0.11));

   // Increase contrast by applying a color transformation based on a quasi-sigmoid gamma curve
   grey = 1 / (1 + pow(1.25-grey/2, 16) );

   // Transform Black/White color range to Black/Red/White color range
   // 1 -> 0.5f ... White -> Red
   if (grey >= 0.75)
   {
       newColor.r = 0.7 + 0.3 * color.r;
       grey = (grey - 0.75) * 4;
       newColor.gb = 0.7 * grey + 0.3 * color.gb;
   }
   else // 0.5f -> 0 ... Red -> Black
   {
       newColor.r = 1.5 * 0.7 * grey + 0.3 * color.r;
       newColor.gb = 0.3 * color.gb ;
   }

   color.rgb = lerp(newColor.rgb, color.rgb, distQ < viewsizeQ ? timeFactor : timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ)));
}
return color;

}

OTHER TIPS

A couple of suggestions

  • You could use a 1D sampler (as a lookup table) for your quasi-sigmoid. If power goes from 0 to 1, then create a texture of 1 x 256 (or whatever horizontal size preserves your function best) and simply look up a value for your current power using tex1D. You will need to run this function on the CPU to fill in this texture, but it would just be done once during load time.
  • You could use the lerp function instead of spelling it out as color.rgb = /*0.7 */ factor * newColor.rgb + /*0.3 **/ (1 - factor) * color.rgb; instead, use color.rgb = lerp(newColor.rgb, color.rgb, factor); (lerp generally compiles down to an assembly instruction on most GPUs), saving you instructions.

Using a couple more lerps I was able to get it below 64 instructions. The lookup table didn't help as atan2 actually lead to fewer instructions than looking up the texture.

// Normalized timefactor (1 = fully enabled)
float timeFactor;

// Center of "light"
float x;
float y;

// Size of "light"
float viewsizeQ;
float fadesizeQ;

// Rotational shift
float angleShift;

// Resolution
float screenResolutionWidth;
float screenResolutionHeight;
float screenZoomQTimesX;

// Texture sampler
sampler TextureSampler : register(s0);

float4 method(float2 texCoord : TEXCOORD0) : COLOR0
{
float4 newColor;

// Look up the texture color.
float4 color = tex2D(TextureSampler, texCoord);

// Calculate distance
float2 delta = (float2(x, y) - texCoord.xy)
             * float2(screenResolutionWidth, screenResolutionHeight);

// Get angle from center
float distQ = dot(delta, delta) - sin((atan2(delta.x, delta.y) + angleShift) * 13) * screenZoomQTimesX;

// Outside fadeSize: No color transformation
if (distQ >= fadesizeQ) return color;

// Otherwise (within color transformed region) /////////////////////////////////////////////////////////

// Make greyscale
float grey = dot(color.rgb, float3(0.3, 0.59, 0.11));

// Increase contrast by applying a color transformation based on a quasi-sigmoid gamma curve
grey = 1 / (1 + pow(1.25-grey/2, 16));

// Transform greyscale to white->red->black gradient
// 1 -> 0.5f ... White -> Red
if (grey >= 0.5)
{
newColor = lerp(float4(0.937,0.104,0.104,1), float4(1,1,1,1), 2 * (grey-0.5)
}
else // 0.5f -> 0 ... Red -> Black
{
newColor = lerp(float4(0,0,0,1), float4(0.937,0.104,0.104,1), 2 * grey);
}

float factor = saturate(timeFactor * (1 - (distQ  - viewsizeQ) / (fadesizeQ - viewsizeQ)));
color.rgb = lerp(color.rgb, newColor.rgb, factor);

return color;
 }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top