문제

I made a C++ program which use the Huffman Coding to compress and decompress txt files.

When I compress the txt file I also save the Huffman tree in the binary (compressed) file, which is a tree made by characters and frequencies. Only the leaf nodes have characters.

I used this serialization but I also serialize the characters. If I read ':' I have to read a character (char) and its frequency (int) from the file, else I have to read only the frequency.

For example this file:

    7 4 : F 2 # # : C 2 # # : A 3 # #

contains the binary tree below:

        (7)
       /   \
    (4)    (A,3)
   /   \
(F,2) (C,2)

It works good but there is a problem caused by the binary extension:

the characters ':' and '#' could be the first byte of a frequency, which is an integer! This is rare but when occurs it causes an error during the file reading.

How can I solve it? I also would like to save the frequency depending on its size! 4 bytes are often wasted!

도움이 되었습니까?

해결책

Why are you saving the frequencies? They are not needed for decoding. All you need is the locations of the characters in the tree.

You don't need to send the tree either, but that's a more advanced topic.

다른 팁

This is the best solution, I really love it but i want to solve the problem without the Canonical Huffman Code.

I finally used this new solution:

I don't need all the frequences but only the leaf nodes' one, so I save a character (1 byte) (for example 'X') if it isn't a leaf node, else I save : + character + frequence. I need to save all the nodes because i have to rebuild the three during the decoding.

This solve the problem of ':' and '#' and uses less memory.

The example above becomes:

    X X : F 2 # # : C 2 # # : A 3 # #

It works great but I'm sure there is a better solution to serialize the binary tree!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top