Question

I've got a working chain in which the GPU (AMD Z430) on iMX53 takes the decoded video frame in YUV420P format, converts to RGB565 and displays it. My only concern is the speed, more exactly the lack of speed. The input video frame is 1920x1088 YUV420P, the conversion time is 40ms, I simply can't make it run faster. I've tried to optimize my shaders, with no luck. I've had a try with a 2D gamut as well, it was even slower (and due to its 2D nature it provided a bit incorrect colors). Sure, I'm not an OpenGL ES expert.

Here are my shaders:

static const char *fragment_shader_yuv_src =
    "const lowp mat3 rgb_coeff = mat3(1, 1, 1, 0, -0.344, 1.772, 1.402, -0.714, 0);\n"
    "varying lowp vec2 v_texcoord;\n"
    "uniform lowp sampler2D s_texture_y;\n"
    "uniform lowp sampler2D s_texture_u;\n"
    "uniform lowp sampler2D s_texture_v;\n"
    "\n"
    "void main()\n"
    "{\n"
    "    lowp vec3 yuv = vec3(texture2D(s_texture_y, v_texcoord).r, texture2D(s_texture_u, v_texcoord).r - 0.5, texture2D(s_texture_v, v_texcoord).r - 0.5);\n"
    "    gl_FragColor = vec4(rgb_coeff * yuv, 1.0);\n"
    "}\n";

static const char *vertex_shader_yuv_src =
    "attribute lowp vec4 position; \n"
    "attribute lowp vec2 texcoord; \n"
    "varying lowp vec2 v_texcoord; \n"
    "                              \n"
    "void main()                   \n"
    "{                             \n"
    "    gl_Position = position;   \n"
    "    v_texcoord = texcoord.xy; \n"
    "}                             \n";

s_texture_y/u/v contain the appropriate color components, images are allocated by eglCreateImageKHR(...) and assigned to textures by glEGLImageTargetTexture2DOES(...).

As I mentioned above, it works, but slow. I can't decide if this is the maximum performance of the GPU or I'm doing something very wrong in the shaders...

(upscaling and drawing a simple 416x416 RGBA32 image with the simplest ever shaders is also very slow, ~23ms)

Anybody any idea, experience? How shall I optimize my shaders?

No correct solution

OTHER TIPS

At first sight, your shader looks not too bad performance-wise. (For reference, check out the glupload.c file of gst-plugins-gl where the typical conversions are done in shaders, both for OpenGL and OpenGL ES).

The lack of performance, without entering into platform specifics, could perfectly be due to the penalty of uploading to GPU. Have you tried with a dummy fragment shader? i.e.:

void main (void)
{
   gl_FragColor = vec4 (0.0, 1.0, 0.0, 1.0);
}

while uploading your frames in different resolutions? My guess is that even for tiny frames, you'll incur into a big time penalty ~10,20ms, and this is because all ongoing calculations need to be finished and the pipeline flushed before the new data can be injected (i.e. uploading is a synchronous operations from the GPU point of view).

If the penalty is large, you can try using a dual upload to the GPU (see this post) and see if it improves your particular driver+hardware.

Finally, in both your shaders and the glupload.c, the use of 3 different textures, one for each channel, and the use of uniforms passed from the CPU code is not going to help performance. You can try two things: one, upload a single bunch of data with the three channels YUV back-to-back and then bring the output-pixel-to-yuv-coordinate logic into the shader itself, something like this:

void main (void)
{ 
   vec2 y_coord = <calculate the position to find the Y component for this pixel>
   vec2 u_coord = <calculate the position to find the U component for this pixel>
   vec2 v_coord = <calculate the position to find the V component for this pixel>
   float y_value = texture2D(s_texture, y_coord).<appropriate channel>
   float u_value = texture2D(s_texture, u_coord).<appropriate channel>
   float v_value = texture2D(s_texture, v_coord).<appropriate channel>
   gl_FragColor = vec4 (rgb_coeff * vec3(y_value, u_value, v_value), 1.0);
}

I can try and develop this shader further if it happens that the upload is not the bottleneck...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top