What you are asking for is called prefix sum or summed area table (SAT) for the 2D case (just so you find online resources more easily).
Summed area tables can be efficiently implemented on the GPU by decomposing into several parrallel prefix sum passes [1], [2].
The prefix sum can be accelerated by using local memory to store intermediate partial sums (see example in OpenCL or example in CUDA, the same can in principle be done in an OpenGL fragment shader as well with image load-store, or in a compute shader: OpenGL Super Bible example, similar example to be found in OpenGL Insights around page 280).
Note that you may quickly run into precision issues as the sum may get quite large for the rightmost (downmost) pixels. Integer or fp16 render targets will most likely result in failure due to overflow or lacking precision, fp32 will work most of the time.