Git: the meaning of object 'size' returned by git verify-pack

Question 1

First, see this diagram. Then, based on the source (builtin/index-pack.c), the value in the fourth field is:

(unsigned long)(obj[1].idx.offset - obj->idx.offset)

which is the raw packed-up size (obj[1] is the next object after this one, or the trailer). As the stored item is deltified, that's the size of the delta-compressed data plus overhead. The value in the third field is obj->size (the first size value from the overhead area).

(To get the actual data, or even its size, you have to inflate the stream a bit and then look at the delta headers. The object's "true" size is encoded in the header as the second size value. See get_size_from_delta in sha1_file.c, get_delta_hdr_size in delta.h, and the "offset encoding" in the diagram.)

Edit to add: OK, re-reading the question, you're asking more about why the fourth size is so much smaller than the third one. That would be because the third one is the inflated (but not de-delta-ed) size of the object. So: size-in-packfile (field 4) is after deflating, but also includes a bit of header overhead; size of delta-compressed file (field 3) is, well, obvious; and size of ultimate file, after undoing delta compression, is in the header whose byte count is included with the size-in-packfile (field 4).

Extra edit: the offset-in-packfile (field 5) is obj->idx.offset. That's where you have to lseek() in the pack file to start reading the object (I think, I've got some confusing code in front of me for handling OBJ_OFS_DELTA too :-) ).

Question 2

With Git 2.21 (Q1 2019), the meaning of "objectsize" is clarified, as the "--format=<placeholder>" option of for-each-ref, branch and tag learned to show a few more traits of objects that can be learned by the object_info API.

See commit 59012fe, commit 5610d9f, commit 33311fa, commit f4ee22b, commit 5305a55, commit 1867ce6 (24 Dec 2018) by Olga Telezhnaya (telezhnaya).
^{(Merged by Junio C Hamano -- gitster -- in commit 55574bd, 18 Jan 2019)}

ref-filter: add objectsize:disk option

Add new formatting option objectsize:disk to know exact size that object takes up on disk.

The git for-each-ref man page now states:

objectsize:
The size of the object (the same as 'git cat-file -s' reports).
Append :disk to get the size, in bytes, that the object takes up on disk.
deltabase:
This expands to the object name of the delta base for the given object, if it is stored as a delta.
Otherwise it expands to the null object name (all zeroes).

Caveats:

Note that the sizes of objects on disk are reported accurately, but care should be taken in drawing conclusions about which refs or objects are responsible for disk usage.
The size of a packed non-delta object may be much larger than the size of objects which delta against it, but the choice of which object is the base and which is the delta is arbitrary and is subject to change during a repack.

Note also that multiple copies of an object may be present in the object database; in this case, it is undefined which copy's size or delta base will be reported.

So you can compare those values with the one reported by git verify-pack -v, as a git for-each-ref is now (5+ years later) able to display more data.

Question 3

There was a recent patch series [RFC/PATCH 0/4] cat-file --batch-disk-sizes discussion which included "[PATCH 07/10] cat-file: add %(objectsize:disk) format atom" which may be of interest if you are into compiling from source.

Git: the meaning of object 'size' returned by git verify-pack

ref-filter: add objectsize:disk option

Caveats:

`ref-filter`: add `objectsize:disk` option