The documentation at this location supports your theory on padding: http://www.raspberrypi.org/wp-content/uploads/2013/07/RaspiCam-Documentation.pdf
Specifically:
Note that the image buffers saved in raspistillyuv are padded to a horizontal size divisible by 16 (so there may be unused bytes at the end of each line to made the width divisible by 16). Buffers are also padded vertically to be divisible by 16, and in the YUV mode, each plane of Y,U,V is padded in this way.
So my interpretation of this is the following. The width is 2592 (divisible by 16 so this is ok). The height is 1944 which is 8 short of being divisible by 16 so an extra 8*2592 are added (also multiplied by 1.5) thus giving your 31104 extra bytes.
Although this kindof helps with the size of the file, it doesn't explain the structure of the YUV output properly. I am having a look at this description to see if this provides a hint to start with: http://en.wikipedia.org/wiki/YUV#Y.27UV420p_.28and_Y.27V12_or_YV12.29_to_RGB888_conversion
From this I believe it is as follows:
Y Channel:
2592 * (1944+8) = 5059584
U Channel:
1296 * (972+4) = 1264896
V Channel:
1296 * (972+4) = 1264896
Giving a sum of :
5059584 + 2*1264896 = 7589376
This makes the numbers add up so only thing left is to confirm if this interpretation is correct.
I am also trying to do the YUV decode (for image comparisons) so if you can confirm if this actually does correspond to what you are reading in the YUV file this would be much appreciated.