The Core Image filters perform this averaging through a series of reductions. A former engineer on the team describes how this was done for the CIAreaAverage filter within this GPU Gems chapter (under section 26.2.2 "Finding the Centroid").
I talk about a similar averaging by reduction in my answer here. I needed this capability on iOS, so I wrote a fragment shader that reduced the image by a factor of four in both horizontal and vertical dimensions, sampling between pixels in order to average sixteen pixels into one at each step. Once the image was reduced to a small enough size, the remaining pixels were read out and averaged to produce a single final value.
This kind of reduction is still very fast to perform on the GPU, and I was able to extract an average color from a 640x480 video frame in ~6 ms on an iPhone 4. You'll of course have a lot more horsepower to play with on a Mac.
You could take a similar approach to this by reducing in only one direction or the other at each step. If you are interested in obtaining a sum of the pixel values, you'll need to watch out for precision limits in the pixel formats used on the GPU. By default, RGBA color values are stored as 8-bit values, but OpenGL (ES) extensions on certain GPUs can give you the ability to render into 16-bit or even 32-bit floating point textures, which extends your dynamic range. I'm not sure, but I believe that Core Image lets you use 32-bit float components on the Mac.