How to avoid having to store extra information about padding (for byte size alignment) with Huffman coding

https://cs.stackexchange.com/questions/100159

huffman-coding

05-11-2019
|

Pergunta

When storing the encoded Huffman bit stream in bytes, in general, either

the final byte gets padded or
a pseudo end-of-file symbol gets used

In the former case, the number of bits padded needs to be stored somewhere, requiring another 3 bits. In the latter, you will lose some efficiency due to the additional pseudo symbol.

So how can I avoid the extra cost of a pseudo symbol, and also avoiding the need to store/transmit extra bits to hold the number of pad bits?

N.B. On the one hand, I'm interested in compressing short strings, so saving 3 bits on average can shave off another 1% or so of the data.

But more importantly, getting rid of the need to send the padding info (which is only known after compressing the whole string) before the encoded data means I can more easily perform the algorithm in a streaming fashion. This is not really necessary for short strings, where everything may be kept in memory, but sometimes my strings actually do get much longer, and I don't want to have to keep that in memory (considering there are hundreds of simultaneous encodings going on).

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange