Question

I got some problems when I tried to encode the following streams by Huffman code. If the coded system is

A:0 B:10 C:110 D:111

then the sequence ABBADC would be 010100111110. How do I decode this bit stream when I receive it? Is the coded system necessary? If so, how do I send this table? If not, how should I decode them?

Était-ce utile?

La solution

You have to build a tree

   Start
 /0    \1
 |    /0 \1 
 A   |   / \
     B   | |
         C D

And move along it on each step following your binary digits. Each time you find the leaf - return to the root

You should send this table either in special header part of your file or completely separately

Autres conseils

You don't need to send the table, just the bit lengths. I.e., 1,2,3,3. Given the bit lengths, a canonical prefix code can be constructed the same way on both the encoding and decoding ends. See this answer for more on canonical codes.

There are many ways to decode a prefix code. You can go bit-by-bit as if you are traversing a tree and eventually end up at a leaf which is a symbol. Then start back at the root of the tree for the next bit.

Or for a canonical code where shorter lengths are assured to have lower integer values than longer lengths, collect bits into integers and compare with a table of values per length. Once the length is established, the integer indexes a table of symbols. This is what the example code puff.c does.

One of the fastest ways is to have a table whose length encompasses the longest code, where each table entry is the symbol and the number of bits. Codes that have fewer bits have multiple table entries. In this case:

000: A 1
001: A 1
010: A 1
011: A 1
100: B 2
101: B 2
110: C 3
111: D 3

Then you read three bits, look it up, emit the symbol in the table, and consume the number of bits indicated. Then get more bits as needed to get back to three bits and repeat. If you get to the end of the input, just fill with zero bits but note how many. If the decoded code uses the fill bits, then flag an error.

This is what zlib's inflate code does, though it's a little more sophisticated in that it builds a two-level table so that too much time isn't spent building long tables.

traditionally tree data structure is used to store huffman data. So when you process A,B,C,D then you go on building a binary tree. The values for encodings i.e. A,B,C,D are stored at the leaf. When you receive a bit stream, you just traverse the tree you already have to decode the code.

In your case though, if you just have these 4 inputs then I would recommend using a hashmap than the tree cause it'll make your lookup easier.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top