Why is the first code of a code length the number of codes of that length times 2?

https://stackoverflow.com/questions/16837029

30-05-2022
|

Pergunta

I've been trying to make sense of an example inflate implementation and this is a bit confusing:

static int decode(struct state *s, struct huffman *h)
{
    int len;            /* current number of bits in code */
    int code;           /* len bits being decoded */
    int first;          /* first code of length len */
    int count;          /* number of codes of length len */
    int index;          /* index of first code of length len in symbol table */

    code = first = index = 0;
    for (len = 1; len <= MAXBITS; len++) {
        code |= bits(s, 1);             /* get next bit */
        count = h->count[len];
        if (code - count < first)       /* if length len, return symbol */
            return h->symbol[index + (code - first)];
        index += count;                 /* else update for next length */
        first += count;
        first <<= 1;
        code <<= 1;
    }
    return -10;                         /* ran out of codes */
}

In particular this part:

first += count;
first <<= 1;

Can someone explain how does a bitshift or a multiplication by a constant yield the actual first code of a particular code length? Basically, the difference between an index of the first code and the actual first code is "times 2"?

For simplicity, just consider it rolling off from 7 to 8 bits of length, where 7 yielded the first non-zero count, in a fixed version would be 24, I guess since the interval is [256, 280>.

My assumption is that it preserves orders of magnitude between the code and the first of the next length, since their subtractive relationship is more relevant than the absolute value to determine the symbol offset -- however - I'm not quite sure that is the case.

Solução

count[len] is the number of codes of each bit length. For the canonical code defined by the deflate format, the codes simply count up from zero. So the first code for the first non-zero count[len] is len zero bits. If count[len] is more than one, then the next code is one more than that, or just len-1 zero bits and a one bit.

When the codes of that length are used, add a zero bit at the end to go to the next length.

Therefore to make the step from the first code of length i to the first code of length i+1, you add count[i] and then shift left by one bit.

Example code:

length   code
   2     00
   2     01
   3     100
   3     101
   3     110
   4     1110
   4     1111

We start with 00 for the first code, which is of length 2. There are two codes of length 2, so we add two and shift left by one bit, i.e. multiply by two, and get the first code of length 3 which is (0 + 2) * 2 = 4 = 100. To get to the first code of length 4, its (4 + 3) * 2 = 14 = 1110.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow