Which options will store a totally new binary blob each time the binary file changes (even a few bytes).
All of them. All blobs (indeed, all objects in the repo) are stored "intact" (more or less) whenever they are "loose objects". The only thing done with them is to give them a header and compress them with deflate compression.
At the same time, though, loose objects are eventually combined into "packs". Git does delta-compression on files in packs: see Is the git binary diff algorithm (delta storage) standardized?. Based on the answers there, you'd be much better off not "pre-compressing" the binaries, so that the pack-file delta algorithm can find long strings of matching binary data.
Does git diff uncompressed binary data better then compressed data (which may change a lot even with minor edits to the uncompressed data).
I have not tried it but the overall implication is that the answer to this should be "yes".
I would assume storing many small binary files is less overhead long term, compared to one large binary file, assuming only some of the files are periodically modified, can git handle small changes to large binary files efficiently?
Certainly all files that are completely unchanged will be stored with a lot of "de-duplication" instantly, as their SHA-1 checksums will be identical across all commits, so that each tree names the very same blob in the repository. If foo.icon
is the same across thousands of commits, there's just the one blob (whatever the SHA-1 for foo.icon
turns out to be) stored.
I'd recommend experimenting a bit: create some dummy test repos with proposed binaries, make proposed changes, and see how big the repos are before and after running git gc
to re-pack the loose objects. Note that there are a lot of tuneables; in particular, you might want to fuss with window
, depth
and window-memory
settings (which can be set on command lines or in git config entries).