GLSL 성능 - 함수 반환 값/유형

https://stackoverflow.com//questions/20052381

26-12-2019
|

문제

저는 하이트맵을 부드럽게 하기 위해 바이큐빅 필터링을 사용하고 있으며 이를 GLSL로 구현했습니다.

쌍삼차 보간: (보다 interpolate() 아래 기능)

float interpolateBicubic(sampler2D tex, vec2 t) 
{

vec2 offBot =   vec2(0,-1);
vec2 offTop =   vec2(0,1);
vec2 offRight = vec2(1,0);
vec2 offLeft =  vec2(-1,0);

vec2 f = fract(t.xy * 1025);

vec2 bot0 = (floor(t.xy * 1025)+offBot+offLeft)/1025;
vec2 bot1 = (floor(t.xy * 1025)+offBot)/1025;
vec2 bot2 = (floor(t.xy * 1025)+offBot+offRight)/1025;
vec2 bot3 = (floor(t.xy * 1025)+offBot+2*offRight)/1025;

vec2 mbot0 = (floor(t.xy * 1025)+offLeft)/1025;
vec2 mbot1 = (floor(t.xy * 1025))/1025;
vec2 mbot2 = (floor(t.xy * 1025)+offRight)/1025;
vec2 mbot3 = (floor(t.xy * 1025)+2*offRight)/1025;

vec2 mtop0 = (floor(t.xy * 1025)+offTop+offLeft)/1025;
vec2 mtop1 = (floor(t.xy * 1025)+offTop)/1025;
vec2 mtop2 = (floor(t.xy * 1025)+offTop+offRight)/1025;
vec2 mtop3 = (floor(t.xy * 1025)+offTop+2*offRight)/1025;

vec2 top0 = (floor(t.xy * 1025)+2*offTop+offLeft)/1025;
vec2 top1 = (floor(t.xy * 1025)+2*offTop)/1025;
vec2 top2 = (floor(t.xy * 1025)+2*offTop+offRight)/1025;
vec2 top3 = (floor(t.xy * 1025)+2*offTop+2*offRight)/1025;

float h[16];

h[0] = texture(tex,bot0).r;
h[1] = texture(tex,bot1).r;
h[2] = texture(tex,bot2).r;
h[3] = texture(tex,bot3).r;

h[4] = texture(tex,mbot0).r;
h[5] = texture(tex,mbot1).r;
h[6] = texture(tex,mbot2).r;
h[7] = texture(tex,mbot3).r;

h[8] = texture(tex,mtop0).r;
h[9] = texture(tex,mtop1).r;
h[10] = texture(tex,mtop2).r;
h[11] = texture(tex,mtop3).r;

h[12] = texture(tex,top0).r;
h[13] = texture(tex,top1).r;
h[14] = texture(tex,top2).r;
h[15] = texture(tex,top3).r;

float H_ix[4];

H_ix[0] = interpolate(f.x,h[0],h[1],h[2],h[3]);
H_ix[1] = interpolate(f.x,h[4],h[5],h[6],h[7]);
H_ix[2] = interpolate(f.x,h[8],h[9],h[10],h[11]);
H_ix[3] = interpolate(f.x,h[12],h[13],h[14],h[15]);

float H_iy = interpolate(f.y,H_ix[0],H_ix[1],H_ix[2],H_ix[3]);

return H_iy;
}

이것은 내 버전이며 텍스처 크기(1025)는 여전히 하드코딩되어 있습니다.정점 셰이더 및/또는 테셀레이션 평가 셰이더에서 이를 사용하면 성능에 매우 나쁜 영향을 미칩니다(20-30fps).하지만 이 함수의 마지막 줄을 다음과 같이 변경하면:

return 0;

이중 선형 또는 가장 가까운/필터링 없이 사용한 것처럼 성능이 향상됩니다.

다음과 같은 경우에도 마찬가지입니다.(성능이 여전히 좋다는 의미입니다)

return h[...]; //...
return f.x; //...
return H_ix[...]; //...

보간 기능:

float interpolate(float x, float v0, float v1, float v2,float v3)
{
    double c1,c2,c3,c4; //changed to float, see EDITs

    c1 = spline_matrix[0][1]*v1;
    c2 = spline_matrix[1][0]*v0 + spline_matrix[1][2]*v2;
    c3 = spline_matrix[2][0]*v0 + spline_matrix[2][1]*v1 + spline_matrix[2][2]*v2 + spline_matrix[2][3]*v3;
    c4 = spline_matrix[3][0]*v0 + spline_matrix[3][1]*v1 + spline_matrix[3][2]*v2 + spline_matrix[3][3]*v3;

    return(c4*x*x*x + c3*x*x +c2*x + c1);
};

최종 결과를 반환할 때만 fps가 감소합니다. H_iy 값.반환 값은 성능에 어떤 영향을 줍니까?

편집하다 나는 내가 사용했다는 것을 방금 깨달았습니다. double 에서 interpolate() 선언하는 함수 c1, c2...방법.나는 그것을 다음과 같이 바꿨다. float, 이제 적절한 반환 값으로 성능이 양호한 상태로 유지됩니다.그러면 질문이 조금 달라집니다.

어떻게 double 정밀도 변수는 하드웨어 성능에 영향을 미치며 왜 다른 보간 기능이 이 성능 손실을 유발하지 않았습니까? H_ix[] 배열은 float 그것도 마찬가지로 H_iy?

해결책

속도를 높이기 위해 할 수 있는 한 가지는 다음을 사용하는 것입니다. texelFetch() 대신에 floor()/texture(), 따라서 하드웨어는 필터링을 수행하는 데 시간을 낭비하지 않습니다.하드웨어 필터링은 매우 빠르지만 이것이 부분적으로 제가 GPU 보석 기사.지금은 또한 textureSize() 자신에게 값을 전달하는 것을 저장하는 함수입니다.

GLSL에는 가능한 모든 것을 버리는 매우 공격적인 최적화 프로그램이 있습니다.그럼 정말 값비싼 조명 값을 계산하는 데 오랜 시간을 소비했다고 가정해 보겠습니다. 하지만 마지막에는 다음과 같이 말하십시오. colour = vec4(1), 모든 계산이 무시되고 매우 빠르게 실행됩니다.벤치마킹을 시도할 때 익숙해지는 데 시간이 걸릴 수 있습니다.나는 이것이 다른 값을 반환할 때 나타나는 문제라고 생각합니다.모든 변수에 종속성 트리가 있고 유니폼과 속성, 심지어 셰이더 단계 전체를 포함하여 출력에 변수가 사용되지 않으면 GLSL은 이를 완전히 무시한다고 상상해 보세요.여기서 GLSL 컴파일러가 부족한 점 중 하나는 필요하지 않을 때 함수 인수를 입/출력 복사하는 것입니다.

배정도에 관해서도 비슷한 질문이 있습니다. https://superuser.com/questions/386456/why-does-a-geforce-card-perform-4x-slower-in-double-precision-than-a-tesla-card.일반적으로 그래픽은 속도가 빨라야 하며 거의 항상 단정밀도만 사용합니다.보다 일반적인 목적의 컴퓨팅 응용 프로그램(예: 과학 시뮬레이션)의 경우 물론 두 배가 더 높은 정확도를 제공합니다.아마도 CUDA와 관련하여 이에 대해 더 많은 것을 찾을 수 있을 것입니다.

다른 팁

하드웨어별로 이중선형 보간법을 유리하게 사용할 수 있습니다.쌍삼차 보간은 기본적으로 쌍선형 보간된 입력 지점으로부터 쌍선형 보간으로 작성할 수 있습니다.이와 같이:

uniform sampler2D texture;
uniform sampler2D mask;
uniform vec2 texOffset;
varying vec4 vertColor;
varying vec4 vertTexCoord;
void main() {
  vec4 p0 = texture2D(texture, vertTexCoord.st).rgba;
  vec2 d  = texOffset * 0.125;
  vec4 p1 = texture2D(texture, vertTexCoord.st+vec2( d.x, d.y)).rgba;
  vec4 p2 = texture2D(texture, vertTexCoord.st+vec2(-d.x, d.y)).rgba;
  vec4 p3 = texture2D(texture, vertTexCoord.st+vec2( d.x,-d.y)).rgba;
  vec4 p4 = texture2D(texture, vertTexCoord.st+vec2(-d.x,-d.y)).rgba;
  gl_FragColor = (  2.0*p0   + p1 + p2 + p3 + p4)/6.0;
 }

그리고 이게 그 결과야

첫 번째 이미지는 표준 Hradware 보간법입니다.
두 번째 이미지는 위 코드를 사용한 쌍삼차 보간입니다.
동일한 쌍삼차 보간법을 사용하지만 등고선을 보기 위해 색상을 구분하여 사용합니다.

First ima

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow