GLSL el desempeño de la función de devolución de valor/tipo de

https://stackoverflow.com//questions/20052381

26-12-2019
|

Pregunta

Estoy usando la opción bicúbica filtrado para suavizar mi mapa, me implementado en GLSL:

La interpolación bicúbica: (ver interpolate() la función más abajo)

float interpolateBicubic(sampler2D tex, vec2 t) 
{

vec2 offBot =   vec2(0,-1);
vec2 offTop =   vec2(0,1);
vec2 offRight = vec2(1,0);
vec2 offLeft =  vec2(-1,0);

vec2 f = fract(t.xy * 1025);

vec2 bot0 = (floor(t.xy * 1025)+offBot+offLeft)/1025;
vec2 bot1 = (floor(t.xy * 1025)+offBot)/1025;
vec2 bot2 = (floor(t.xy * 1025)+offBot+offRight)/1025;
vec2 bot3 = (floor(t.xy * 1025)+offBot+2*offRight)/1025;

vec2 mbot0 = (floor(t.xy * 1025)+offLeft)/1025;
vec2 mbot1 = (floor(t.xy * 1025))/1025;
vec2 mbot2 = (floor(t.xy * 1025)+offRight)/1025;
vec2 mbot3 = (floor(t.xy * 1025)+2*offRight)/1025;

vec2 mtop0 = (floor(t.xy * 1025)+offTop+offLeft)/1025;
vec2 mtop1 = (floor(t.xy * 1025)+offTop)/1025;
vec2 mtop2 = (floor(t.xy * 1025)+offTop+offRight)/1025;
vec2 mtop3 = (floor(t.xy * 1025)+offTop+2*offRight)/1025;

vec2 top0 = (floor(t.xy * 1025)+2*offTop+offLeft)/1025;
vec2 top1 = (floor(t.xy * 1025)+2*offTop)/1025;
vec2 top2 = (floor(t.xy * 1025)+2*offTop+offRight)/1025;
vec2 top3 = (floor(t.xy * 1025)+2*offTop+2*offRight)/1025;

float h[16];

h[0] = texture(tex,bot0).r;
h[1] = texture(tex,bot1).r;
h[2] = texture(tex,bot2).r;
h[3] = texture(tex,bot3).r;

h[4] = texture(tex,mbot0).r;
h[5] = texture(tex,mbot1).r;
h[6] = texture(tex,mbot2).r;
h[7] = texture(tex,mbot3).r;

h[8] = texture(tex,mtop0).r;
h[9] = texture(tex,mtop1).r;
h[10] = texture(tex,mtop2).r;
h[11] = texture(tex,mtop3).r;

h[12] = texture(tex,top0).r;
h[13] = texture(tex,top1).r;
h[14] = texture(tex,top2).r;
h[15] = texture(tex,top3).r;

float H_ix[4];

H_ix[0] = interpolate(f.x,h[0],h[1],h[2],h[3]);
H_ix[1] = interpolate(f.x,h[4],h[5],h[6],h[7]);
H_ix[2] = interpolate(f.x,h[8],h[9],h[10],h[11]);
H_ix[3] = interpolate(f.x,h[12],h[13],h[14],h[15]);

float H_iy = interpolate(f.y,H_ix[0],H_ix[1],H_ix[2],H_ix[3]);

return H_iy;
}

Esta es mi versión de la misma, la textura, el tamaño(1025) es todavía codificados.El uso de este en el vertex shader y/o en la teselación de evaluación de sombreado, que afecta el rendimiento muy mal (20-30fps).Pero cuando tengo que cambiar la última línea de esta función:

return 0;

el rendimiento aumenta al igual que si he utilizado bilineal o más cercano de/sin filtrar.

Lo mismo sucede con:(Me refiero a que el rendimiento sigue siendo bueno)

return h[...]; //...
return f.x; //...
return H_ix[...]; //...

La función de interpolación:

float interpolate(float x, float v0, float v1, float v2,float v3)
{
    double c1,c2,c3,c4; //changed to float, see EDITs

    c1 = spline_matrix[0][1]*v1;
    c2 = spline_matrix[1][0]*v0 + spline_matrix[1][2]*v2;
    c3 = spline_matrix[2][0]*v0 + spline_matrix[2][1]*v1 + spline_matrix[2][2]*v2 + spline_matrix[2][3]*v3;
    c4 = spline_matrix[3][0]*v0 + spline_matrix[3][1]*v1 + spline_matrix[3][2]*v2 + spline_matrix[3][3]*v3;

    return(c4*x*x*x + c3*x*x +c2*x + c1);
};

El fps sólo disminuye cuando regrese el final, H_iy valor.¿Cómo funciona el valor de retorno afecta el rendimiento?

EDITAR Me he dado cuenta de que he usado double en el interpolate() la función de declarar c1, c2...ect.He cambiado a float, y el rendimiento sigue siendo bueno con el correcto valor de retorno.Así que la pregunta cambia un poco:

¿Cómo funciona un double precisión variable afecta el rendimiento del hardware, y ¿por qué no la otra función de interpolación desencadenar esta pérdida de rendimiento, sólo el último, ya que el H_ix[] matriz de float también, como los H_iy?

Solución

Una cosa que usted puede hacer para acelerar este proceso es el uso de texelFetch() en lugar de floor()/texture(), por lo que el hardware no pierdas el tiempo haciendo ningún tipo de filtro.Aunque el hardware de filtrado es bastante rápido, lo que es en parte por qué he enlazado el gpu gemas artículo.Ahora existe también un textureSize() la función que guarda pasando los valores en sí mismo.

GLSL tiene una muy agresivo optimizer, lo que tira por la borda todo lo que posiblemente puede.Así que digamos que pasar años, el cómputo muy caro iluminación de valor, pero al final acaba de decir colour = vec4(1), todo lo que su cálculo se ignora y se ejecuta muy rápido.Esto puede tomar algún tiempo para acostumbrarse al intentar comparar las cosas.Creo que este es el problema que usted ve cuando la devolución de valores diferentes.Imagine que cada variable tiene un árbol de dependencia y si alguna variable no se usa en una salida, incluyendo los uniformes y los atributos e incluso en todo el shader etapas, GLSL la ignora completamente.Un lugar en el que he visto GLSL compiladores se quedan cortos aquí es en copiar en/fuera de argumentos de la función si es que no tienes que.

Como para el de doble precisión, una pregunta similar es aquí: https://superuser.com/questions/386456/why-does-a-geforce-card-perform-4x-slower-in-double-precision-than-a-tesla-card.En general, los gráficos deben ser rápidas y casi siempre los usos de precisión simple.Para los más de propósito general para aplicaciones de computación, por ejemplo, simulaciones científicas, dobles de curso dará una mayor precisión.Usted probablemente encontrará mucho más acerca de esto en relación con CUDA.

Otros consejos

puede utilizar la interpolación bilineal por hardware para su ventaja.la interpolación bicúbica puede ser, básicamente, escrito como interpolación bilineal de bilinearly interpola los puntos de entrada.Como este:

uniform sampler2D texture;
uniform sampler2D mask;
uniform vec2 texOffset;
varying vec4 vertColor;
varying vec4 vertTexCoord;
void main() {
  vec4 p0 = texture2D(texture, vertTexCoord.st).rgba;
  vec2 d  = texOffset * 0.125;
  vec4 p1 = texture2D(texture, vertTexCoord.st+vec2( d.x, d.y)).rgba;
  vec4 p2 = texture2D(texture, vertTexCoord.st+vec2(-d.x, d.y)).rgba;
  vec4 p3 = texture2D(texture, vertTexCoord.st+vec2( d.x,-d.y)).rgba;
  vec4 p4 = texture2D(texture, vertTexCoord.st+vec2(-d.x,-d.y)).rgba;
  gl_FragColor = (  2.0*p0   + p1 + p2 + p3 + p4)/6.0;
 }

y este es el resultado

la primera imagen es estándar Hradware interpolación
la segunda imagen es la interpolación bicúbica utilizando el código anterior
la misma interpolación bicúbica pero con datos discretos de color para ver las curvas de nivel

First ima

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow