Question

I am documenting an old file format and have stumped myself with the following issue.

It seems to be that integers are variable-length encoded, with numbers <= 0x7F encoded in a single byte, but >= 0x80 are encoded in two bytes. An example set of integers and their encoded counterparts:

  • 0x390 is encoded as 0x9007
  • 0x150 is encoded as 0xD002
  • 0x82 is encoded as 0x8201
  • 0x89 is encoded as 0x8901

I have yet to come across any numbers that are larger than 0xFFFF, so I can't be sure if/how they are encoded. For the life of me, I can't work out the pattern here. Any ideas?

Was it helpful?

Solution

At a glance it looks like the numbers are split into 7-bit chunks, each of which is encoded as the 7 least significant bits of an output byte, while the most significant bit signifies whether there are more bytes following this one (i.e. the last byte of an encoded integer has 0 as its MSB).

The least significant bits of the input come first, so I guess you could call this "little endian".

Edit: see https://en.wikipedia.org/wiki/Variable-length_quantity (this is used in MIDI and Google protocol buffers)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top