What is this variable-length integer encoding?

https://stackoverflow.com/questions/14096119

13-12-2021
|

Question

I am documenting an old file format and have stumped myself with the following issue.

It seems to be that integers are variable-length encoded, with numbers <= 0x7F encoded in a single byte, but >= 0x80 are encoded in two bytes. An example set of integers and their encoded counterparts:

0x390 is encoded as 0x9007
0x150 is encoded as 0xD002
0x82 is encoded as 0x8201
0x89 is encoded as 0x8901

I have yet to come across any numbers that are larger than 0xFFFF, so I can't be sure if/how they are encoded. For the life of me, I can't work out the pattern here. Any ideas?

Solution

At a glance it looks like the numbers are split into 7-bit chunks, each of which is encoded as the 7 least significant bits of an output byte, while the most significant bit signifies whether there are more bytes following this one (i.e. the last byte of an encoded integer has 0 as its MSB).

The least significant bits of the input come first, so I guess you could call this "little endian".

Edit: see https://en.wikipedia.org/wiki/Variable-length_quantity (this is used in MIDI and Google protocol buffers)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow