문제

Suppose we have file A that has been compressed by the the method B and the output-file is C, now if I am not wrong We can not compress C more by method B, but there might another method=algorithm D that might compress C more and produce compressed file E.

Is this kind method of using more than one compression-method in each iteration of compression standard? if there is a paper/survey paper on performance of such kind of compression, please, comment or post an answer.

I mean, is there any study where 2 or 3 methods were apply to same file, if there is a paper/survey paper on performance of such kind of compression, please, comment or post an answer.

도움이 되었습니까?

해결책

Here's a partial "yes" answer to your question.

Consider a file containing:

0000000000
0000000001
0000000010
0000000011
0000000100
...
1111111111

That is, it is all binary numbers that are $b$ bits long (in this case $b=10$, but in the exercise that follows, try something smaller), in order, with each number separated by (say) a newline character.

The standard BWT compression algorithm (e.g. the one that BZip is based on) performs a Burrows-Wheeler transform, followed by move-to-front coding, then followed by some kind of entropy coding (possibly using run-length encoding).

This file compresses "better" if you apply the BWT twice, applying the following stages to the BWT of the BWT instead of just the BWT. I encourage you to compute both BWTs on a small example to see why.

Now this is only one stage of the full compression algorithm, and so we haven't found a file that compresses better if you compress it twice. But this BWT "adversary" is certainly suggestive.

I don't know if there are any papers on this. I discovered this one a few years ago (but I'm sure I'm not the only person who has noticed it).

다른 팁

The essential problem is that most files are NOT compressible (see the counting argument). And an already compressed file is much less likely to be compressible.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 cs.stackexchange
scroll top