Pergunta

I'm looking into implementing LZW compression in C++, and not sure of the best dictionary implementation.

Hash table made sense, but I don't understand how I would be able to 'reassign' values. If the table gets full, I need to be able to start overwriting previous (oldest) multi-char dictionary entries. Hash table would require me to keep track of these, find it, remove it, and then insert the new one.

Any suggestions?

Foi útil?

Solução

What you're looking for is actually two data structures put together:

  1. A hash table.
  2. A FIFO queue (to discard old table entries)).

You can implement them yourself if you're looking for practice as your comments suggest, or use the stl/sgi/c++11 implementations (unordered_map is the actual hash map, either through sgi or c++11, and a FIFO queue is a doubly linked list, such as std::deque).

The idea is that whenever you want to discard the oldest dictionary entry, you pop the last element in the queue, and then remove it from the hash table as well.

Outras dicas

The Unix compress utility (source code link) uses double hashing and periodic table clearing.

If you want fast compression and decompression, then there are far better choices than LZW, which is horribly obsolete. You should look at fast, level 1 compression in zlib (probably already on your machine), LZO, and lz4.

There is no reason to write new LZW code other than for instructional or entertainment value. It is only of historical interest. You could also study the compress utility for such instruction and entertainment.

You must use two different structures in compression and decompression.

While compressing, you should use a Trie, since you must search the dictionary by content and not by key.

While decompressing, you access the dictionary in the more conventional way, that is, by key. You could then use any associative array structures. like hashtables or even a vector/deque of strings (since your indexes are consecutive natural numbers).

You can try 2 dictionaries implemented in lzws:

  1. Linked list, memory usage <= 327 KB.
  2. Sparse array, memory usage <= 33.5 MB.
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top