Name of binary encoding scheme for integer numbers
-
29-09-2020 - |
Question
I once found on Wikipedia a nice technique for encoding $k \in (2^{n-1}, 2^n)$ uniformly distributed integer numbers with less then $\log_2n$ average bits/symbol, thanks to a simple to compute variable length code. Basically it used $\log_2n$ for some symbols and $\log_2n - 1$ for some others.
Unfortunately all my Googling has failed me. I recall something similar to "variable length binary", but I keep ending on VLQ which are a different beast. Since I know your memory better than mine, can you help me?
Solution 2
The technique idea is perfectly described in Yuval Filmus answer. Even if slightly different, it is called Truncated binary encoding in Wikipedia. I couldn't find an original source for that, apart from a mention in a patent, in this book, or in this Google API
OTHER TIPS
Suppose $k = 2^{n-1} + t$, where $0 \leq t < 2^{n-1}$. Use the following to encode $z \in \{0,\ldots,k-1\}$:
- If $z < 2^{n-1}-t$ then encode $z$ as its own $(n-1)$-bit encoding.
- Otherwise, write $z = 2^{n-1}-t + 2\delta+\epsilon$, where $\delta \in \{0,\ldots,t-1\}$ and $\epsilon \in \{0,1\}$. Encode $z$ at the $(n-1)$-bit encoding of $2^{n-1}-t+\delta$ followed by $\epsilon$.
Here is an example. Let $k = 11 = 2^3+3$. The encoding is as follows:
- $0 \to 000$.
- $1 \to 001$.
- $2 \to 010$.
- $3 \to 011$.
- $4 \to 100$.
- $5 \to 1010$.
- $6 \to 1011$.
- $7 \to 1100$.
- $8 \to 1101$.
- $9 \to 1110$.
- $10 \to 1111$.