GIF format - separate variable-length codes

https://stackoverflow.com/questions/5876916

28-10-2019
|

문제

I trying to parse GIF format and have one problem with reading image data. This data represented like bit array, containing variable-length values.

ex:

0010-1010-0010-0000-00111-10000-11111...

Sometimes length of the code increases, but I can't understand how I can detect this increasing. I have just initial code size (length of the first code ex. 4).

Standart says only:

Appendix F. Variable-Length-Code LZW Compression.

...

The Variable-Length-Code aspect of the algorithm is based on an initial code size (LZW-initial code size), which specifies the initial number of bits used for the compression codes. When the number of patterns detected by the compressor in the input stream exceeds the number of patterns encodable with the current number of bits, the number of bits per LZW code is increased by one.

...

해결책

When parsing a GIF file, the Image Descriptor includes the bit width of the unencoded symbols (example: 8 bits). As you probably already know, the initial code size of the compressed data is one bit wider than the bit width of the unencoded symbols (example: 9 bits).

Also, as you probably already know, the possible compressed code values in a GIF file gradually increase in size, up to a maximum of 0xFFF == 4095 which requires 12 bits to store.

For every code that the decompressor pulls from the compressed data, the decompressor adds a new item to its dictionary. For example, if the first two 9-bit codes the decompressor reads are 0x028 0x0FF, the decompressor adds a two-byte sequence to its dictionary. Later if the decompressor ever reads the code 0x102, the decompressor decodes that 0x102 code to the two 8-bit bytes 0x28 0xFF.

The next item the decompressor adds to the dictionary is assigned the code 0x103.

The next item the decompressor adds to the dictionary is assigned the code 0x104. ...

Eventually the decompressor adds an item to the dictionary that is assigned the code 0x1FF. That is the largest number that fits into 9 bits. After storing that item into the dictionary, the decompressor starts reading 10-bit codes.

The next item the decompressor adds to the dictionary is assigned the code 0x200.

There isn't any special "command" in the data sequence that tells the decompressor to increment the code width. The decompressor must keep track of how many items the dictionary contains so far (which often can be conveniently re-used as the index of where to write the next item into the dictionary). When the decompressor adds item 0x1ff to the dictionary, it's time for the decompressor to start reading 10-bit codes. When the decompressor adds item 0x3ff to the dictionary, it's time for the decompressor to start reading 11 bit codes.

다른 팁

Look at this example first, it may be clearer to understand LZW than looking at the standard. And this may also be useful.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow