To what factor can I compress information from binary to Quinary

https://cs.stackexchange.com/questions/13817

16-10-2019
|

Question

I am trying to think if we can have an injective mapping $\mathcal{f}$, say something like $\Sigma_{bool}= \{0,1\}^* $ to $\Sigma_{5} = \{0,1,2,3,4\}^* $ such that $|x| \leq 2.|\mathcal{f}(x)|$

The motivation here is simple, consider I have a word $110110110110110110110$, so i can compress and represent that as $(110)^7$ or $110,111$ so what can i do if i am allowed to use symbols from $\Sigma_5$?

I hope my question made sense :)

Solution

Here is a simple way to get the inequality $|f(x)| \leqslant {1 \over 2}(|x| + 3)$. Just encode words by blocks of length $2$ using the coding $00 \to 0$, $01 \to 1$, $10 \to 2$ and $11 \to 3$ and add the suffix $40$ or $41$ for the last letter if the length of the word is odd. For instance, $f(11011000) = 3120$, $f(11011000\mathbf{0}) = 3120\mathbf{40}$ and $f(11011000\mathbf{1}) = 3120\mathbf{41}$.

One can improve this inequality to $$ (*) \qquad|f(x)| \leqslant \biggl\lceil{1 \over \log_2 5}|x| + C\biggr\rceil $$ where $\log_2 5 \approx 2.3219\cdots$ and $c = \frac{3}{\log_2 5}-1 \approx 0.2920\cdots$. Indeed, the number of words of length $\leqslant n$ in $\Sigma^*$ is $\frac{|\Sigma|^{n+1}-1}{|\Sigma|-1}$. Therefore, there is an injection from $\Sigma_2^{\leqslant n}$ to $\Sigma_5^{\leqslant m}$ if and only if $$ 2^{n+1} - 1 \leqslant \frac{1}{4}(5^{m+1} - 1) $$ which gives $4(2^{n+1} -1) \leqslant 5^{m+1} - 1$, or equivalently, $2^{n+3} \leqslant 5^{m+1} + 3$. This inequality is already satisfied if $2^{n+3} \leqslant 5^{m+1}$, that is, if $n+3 \leqslant (m+1)\log_2 5$. In particular, if you choose $m = \lceil {1 \over \log_2 5}n + C\rceil$, one can build an injection $f$ from $\Sigma_2^*$ to $\Sigma_5^*$ satisfying $(*)$ as follows. Recall that the shortlex (or radix) order is the order $\leqslant$ on $\Sigma^*$ defined by $u \leqslant v$ if and only if $|u| < |v|$ or $|u| = |v|$ and $u \leqslant_{lex} v$. That is, words are first ordered by length and words of equal length are ordered according to the lexicographic order. The shorlex order is a total order on $\Sigma_k^*$ and hence each word $u$ has a rank $r_k(u)$ in this order. For instance, for $k = 2$ you get $r_2(\varepsilon) = 0$, $r_2(0) = 1$, $r_2(1) = 2$, $r_2(00) = 3$, $r_2(01) = 4$, $r_2(10) = 5$, $r_2(11) = 6$, $r_2(000) = 7$, etc. For $k = 5$, you have $r_5(\varepsilon) = 0$, $r_5(0) = 1$, $r_5(1) = 2$, $r_5(2) = 3$, $r_5(3) = 4$, $r_5(4) = 5$, $r_5(00) = 6$, $r_5(01) = 7$, etc. Given $u \in \Sigma_2^*$, you now define $f(u)$ has the unique word $v \in \Sigma_5^*$ such that $r_5(v) = r_2(u)$ or, if you prefer, $f(u) = r_5^{-1}(r_2(u))$. This $f$ will satisfy $(*)$ but is of course more difficult to compute than the first coding.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange