Memory locality / cache coherency. Most image processing operations operate in 2D and for efficient memory access you want pixels that are close to each other in 2D to be close to each other in memory. Arranging the data in blocks like this means that 2 pixels that have the same x coordinate and adjacent y coordinates will on average have closer memory addresses than if you used a simple linear layout.
There are more complex ways of laying out the image that are often used for textures when rendered by GPUs which give even better memory locality on average.