Question

I'm new to encoding so I'm trying to understand the basics. I came across a document that was describing a lossless text compression technique, and in this document was a figure that illustrated how their compression works. It works like so:

Source -> BWT -> MTF -> RLT -> Proprietary Entropy Encoder

I don't understand why they would use Run-Length Transform after Move to Front Transform, it doesn't seem efficient to me. As I understand it, MTF does not produce many runs itself and therefore it would not be useful to use RLT afterwords.

Some explanation would be much appreciated!

Was it helpful?

Solution 2

It's not efficient.

It's probably an old paper on BWT. Back then, using RLE (run-length Encoder) was a common trick to improve speed. RLE, btw, was not only used after BWT, but also before, in order to speed up BWT stage. The same logic could apply to the entropy coder, but it is unlikely to provide as much benefit.

Nowadays, this trick are completely out dated. Before BWT, you might find some kind of LZP pre-processing, especially for long range matches (beyond block size), with about the same intend as RLE regarding speed improvement, but more powerful. MTF is also completely replaced, since it is too CPU intensive and not as efficient for its cost.

OTHER TIPS

The goal of the MTF after the BWT is to shrink the range of symbols which is good for entropy coding. MTF replaces symbols with their rank which is usually small due to the repetitions generated by the BWT. A Zero Length Coding may be applied after that since a lot of 0s may be produced by the MTF (it is just a specific case of RLE where only runs of 0s are encoded by length).

Take a look at https://github.com/flanglet/kanzi-cpp for implementation examples. You can run the BlockCompressor with the Block codec with or without MTF+ZLE.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top