Question

Git can generate patches/diffs for binary files as well as for text files.

I'm trying to figure out what encoding it uses for its binary patches.

Here is an example:

diff --git a/www/images/openconnect.png b/www/images/openconnect.png
new file mode 100644
index 0000000000000000000000000000000000000000..51a5d620083cafdc8be07fc42db44ee4a273cacc
GIT binary patch
literal 55947
zcmdRWhd<R{{QouL+Lx^CE7^o(&x`1mLiWne-g{(SE4%EOaTT&R8Ie)46SA_pbc?KP
zzUO|vkMHk)_}#}t>6X0T@AEpZ*K-|lT94EzNSR0>5D3M64OJZo1TO`A$Uup}J5o}F
zs^B+5FT{OaD0l@!ZDPTnN!&GzydV&wVcZAa(v9vV@a7F~HAC+wZg$>&mY%i{KR-WV
...
zM_(nPM^0iqGn&ziW^}xgq{7*>(Z~zK&uB(7n$e8P)2d=17EN{7l9w}@(Trv^qY==m
zVj$pbc}Z>Q83UQojAk^WG1F>eAQJOc8zsxw&S*w6n$e8P)3L}vzB|q+oEgn%Ml+fb
z)3L}vX6CCI&1gn5ngGoh$c$z*qZ!R;AX+sH#8%$gAZR*cATyfLjAk?eS~Uy=GVLP*
l@a=IAWJWWZ(TrvU{C~H_V_Z$W5taY|002ovPDHLkV1k~|z(xQ7

literal 0
HcmV?d00001

This is clearly some kind of binary-to-ASCII encoding… but it is not the common Base64. It appears to use more ASCII characters… and all the encoded lines (except for the last one!?) begin with z.

Was it helpful?

Solution

Aha, it's RFC1924's version of the base85 encoding, which uses 5 ASCII characters to represent 4 bytes (80% efficiency):

static const char en85[] = {
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
    'U', 'V', 'W', 'X', 'Y', 'Z',
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
    'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
    'u', 'v', 'w', 'x', 'y', 'z',
    '!', '#', '$', '%', '&', '(', ')', '*', '+', '-',
    ';', '<', '=', '>', '?', '@', '^', '_', '`', '{',
    '|', '}', '~'
};

The additional wrinkle is that Git prefixes every line with a single letter ([A-Za-z]) to indicate 1-52 bytes encoded on that line.

Source code: https://github.com/git/git/blob/master/base85.c

Announcement of this feature on the Git mailing list: http://www.gelato.unsw.edu.au/archives/git/0605/19975.html

Licensed under: CC-BY-SA with attribution
scroll top