Most efficient way to perform sum of textures

Question

So if understand the question correctly, you render into some textures, and then need a weighted sum over all of those textures, and want to display just that image. If so, you could just do one extra rendering pass, while having all of your textures bound, and just calculate the weighted sum of all textures in the fragment shader. Since you do not need the result as a texutre, you could directly render into the default framebuffer, so the result should become immediately visible.

With up to 9 textures you need the most, you can actually follow that strategy, since there will be enough texture units. However, that approach might be a bit inflexible, especially if you have to deal with a varying number of textures to sum up at different points in time.

It would be nice if you could just have a uniform variable with the count, and array of weight values, and a loop in the shader which would boil down to

uniform int count;
uniform float weights[MAX_COUNT];
uniform sampler2D uTex[MAX_COUNT];
[...]
for (i=0; i<count; i++)
    sum += weight[i] * texture(uTex[i], texcoords);

And you can do that beginning with GL 4. It does support arrays of texture samplers, but requires that the access index is dynamically uniform, which means that all shader invocations are going to access the same texture samplers at the same time. As the loop only depends on a uniform variable, this is the case.

However, it might be a better strategey to just not use multiple textures. Assuming all of your input textures have the same resolution, you might be better off using just one texture array. You can attach a layer of such an array texture to an FBO as you can do with a ordinary 2D texture, so rendering to them indepedently (or rendering to multiple layers at a time using multiple render targets) will just work. You then only need to bind that single array texture and can do

uniform int count;
uniform float weights[MAX_COUNT];
uniform sampler2Darray uTex;
[...]
for (i=0; i<count; i++)
    sum += weight[i] * texture(uTex, vec3(texcoords,i));

This only requires GL3 level hardware and the maximum count you can work with is not limited by the number of texture units available to the texture shader, but tby the texture array limit (typically > 256) and the available memory. However, the performance will go down if count gets too high. You might reach some point where actually using multiple passes where you only sum up a certain sub-range of your images becomes more efficient, due to the texture cache. In this approach, all the texture accesses of the different layers compete for the texture cache, negatively impacting the cache hit rate between neighboring fragments. But this should be no issue with just 8 or 9 input images.