Question

I am doing an assignment on Huffman Coding. I have managed to build a frequency tree of characters from a text file, generate the a code of 0's and 1's for each letter, write the text file to another file using the codes and also to decode the sentence of codes.

My goal is to achieve the following:

To compress: break up the string of 0's and 1's into 8 bit chunks, and write the character represented by each chunk to the compressed file. Decode by replacing each character with the 8 bits needed to represent it.

I am quite unsure about how I got about compressing it into 8 bit chunks.

Était-ce utile?

La solution

You have to keep track of two states: The bits you write for each character, whose number will vary according to the characher's level in the tree and the 8-bit chunks that you write to. You will need to fiddle with the output chunks on the bit level.

There are various approaches. Here's a solution that writes the bits one at a time by shifting a data bit and appending the bit to write. When the data bit is full, it is written to the 8-bit-chunk output sink. The code below checks for data bit overflow with a sentinel bit, that eventually is shifted out of the 8-bit range:

static unsigned int out = 0x01;

void write_bit(bool bit)
{
    out <<= 1;                    // shift byte to make room
    if (bit) out |= 0x01;         // set lowest bit id desired

    if (out & 0x100) {            // was the sentinel bit shifted out?
        write_byte(out & 0xff);   // final output of 8-bit chunk
        out = 0x01;               // reset to sentinel vylue
    }
}

void flush_bit()
{
    while (out != 0x01) write_bit(false); 
}

int main()
{
    write_bit(1);
    write_bit(0);
    write_bit(1);
    // ...
    flush_bit();

    return 0;    
}

The flush() ensures that bits that are still in the data byte are written out by padding the Huffman sequence with zero bits.

The actual bit output works like this:

  0000 0001    // initial state: one sentinel bit
  0000 0011    // after writing a 1
  0000 0110    // after writing a 0
  1101 1111    // after writing five more 1s
1 1011 1110    // after writing a 0: data byte is full
               // write 1011 1110 to final output
  0000 0001    // reset data byte

For this to work, the data bit has to be stored in an integer that can hold more bits than a byte.

You could also keep track of the byte state with a separate variable. Writing bits one by one is probably not efficient, but straightforward.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top