Question

I'm confused about the point of adaptive arithmetic coding.

I understand that static arithmetic coding involves using preset probabilities of symbols that remain static during the whole process. I also understand that adaptive arithmetic coding involves change all probabilities after each symbol encountered.

However, what is the point of changing the probability after each symbol? Why wouldn't you just go through an entire file first and determine the probabilities and then do the arithmetic coding as a second pass?

Additionally, I don't understand how changing the probability of symbols impacts the compression? If we know the true probabilities of the symbols in the file we are compressing, then will it make the file smaller?

Was it helpful?

Solution

First, consider "going through an entire file". There are a few assumptions worth thinking about here.

Files can be very big, and traversing them twice can be expensive. This is one reason why most real-world compression standards are based around blocks or windows.

There are situations where you don't have "the entire file" to begin with, such as a communication channel. TLS (prior to 1.3, at least) supports compression, for example.

Files are not always homogeneous. Archives (e.g. tar) are a case in point. A statistical model that is appropriate for one part of a file may not be appropriate for another part. Adaptive coding adapts to this, too.

As to your final question, if both the encoder and decoder knew the true probabilities of the symbols in the file we are compressing, then that wouldn't need to be transmitted. And, indeed, we sometimes do this in the real world. The JPEG standard, for example, specifies default coding tables for those times when they are appropriate, and lets an encoder supply their own when they are not.

Transmitting a static coding table efficiently (i.e. compressing it) is a nontrivial problem, especially for a large code alphabet. For a well-designed scheme, the cost of transmitting the table should be equal to the "learning cost" of using an adaptive code.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top