Distinguishing between uppercase and lowercase letters in the “move-to-front” method

https://cs.stackexchange.com/questions/326

16-10-2019
|

Question

Is it not necessary to encode both the uppercase and lowercase letter while encoding a message with the move-to-front transform? From an old computer science course exam, the problem was to encode Matt_ate_the_mat starting with an empty list.

Using the author's solution methodology of not taking into account M versus m one arrives at

$C(1)C^∗(M)\\ C(2)C^∗(a)\\ C(3)C^∗(t)\\ C(1)\\ C(4)C^∗(\_)\\ C(3)\\ C(3)\\ C(5)C^∗(e)\\ C(4)\\ C(3)\\ C(6)C^∗(h)\\ C(4)\\ C(4)\\ C(6)\\ C(6)\\ C(6)$

Seeing that move-to-front works best with items that are repeated this works to one advantage as long as the difference between M and m in the original message is not important, correct?

Though would it not change the last encodings if taking into account m to $C(7)C^*(m)$ or was this done for the sake of brevity within the exam?

Solution

Though for us, human beings, m and M are the same, for a computer these are two distinct symbols that have no relation whatsoever.

The answer actually depends on the application:

If your application is not case-sensitive, this separation does not matter, and the encoding is correct.
If the application is case-sensitive there must be a distinction between lower case and upper case.

It seems to me that the encoding is more efficient as the number of symbols is smaller, thus maybe it is better to encode upper and lower case as an additional bit rather than having the list to contain 26x2 symbols, but this depends on the distribution of the text you are encoding.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange