At first sight, your shader looks not too bad performance-wise. (For reference, check out the glupload.c file of gst-plugins-gl where the typical conversions are done in shaders, both for OpenGL and OpenGL ES).
The lack of performance, without entering into platform specifics, could perfectly be due to the penalty of uploading to GPU. Have you tried with a dummy fragment shader? i.e.:
void main (void)
{
gl_FragColor = vec4 (0.0, 1.0, 0.0, 1.0);
}
while uploading your frames in different resolutions? My guess is that even for tiny frames, you'll incur into a big time penalty ~10,20ms, and this is because all ongoing calculations need to be finished and the pipeline flushed before the new data can be injected (i.e. uploading is a synchronous operations from the GPU point of view).
If the penalty is large, you can try using a dual upload to the GPU (see this post) and see if it improves your particular driver+hardware.
Finally, in both your shaders and the glupload.c, the use of 3 different textures, one for each channel, and the use of uniforms passed from the CPU code is not going to help performance. You can try two things: one, upload a single bunch of data with the three channels YUV back-to-back and then bring the output-pixel-to-yuv-coordinate logic into the shader itself, something like this:
void main (void)
{
vec2 y_coord = <calculate the position to find the Y component for this pixel>
vec2 u_coord = <calculate the position to find the U component for this pixel>
vec2 v_coord = <calculate the position to find the V component for this pixel>
float y_value = texture2D(s_texture, y_coord).<appropriate channel>
float u_value = texture2D(s_texture, u_coord).<appropriate channel>
float v_value = texture2D(s_texture, v_coord).<appropriate channel>
gl_FragColor = vec4 (rgb_coeff * vec3(y_value, u_value, v_value), 1.0);
}
I can try and develop this shader further if it happens that the upload is not the bottleneck...