Question

I once found on Wikipedia a nice technique for encoding $k \in (2^{n-1}, 2^n)$ uniformly distributed integer numbers with less then $\log_2n$ average bits/symbol, thanks to a simple to compute variable length code. Basically it used $\log_2n$ for some symbols and $\log_2n - 1$ for some others.

Unfortunately all my Googling has failed me. I recall something similar to "variable length binary", but I keep ending on VLQ which are a different beast. Since I know your memory better than mine, can you help me?

Was it helpful?

Solution 2

The technique idea is perfectly described in Yuval Filmus answer. Even if slightly different, it is called Truncated binary encoding in Wikipedia. I couldn't find an original source for that, apart from a mention in a patent, in this book, or in this Google API

OTHER TIPS

Suppose $k = 2^{n-1} + t$, where $0 \leq t < 2^{n-1}$. Use the following to encode $z \in \{0,\ldots,k-1\}$:

  • If $z < 2^{n-1}-t$ then encode $z$ as its own $(n-1)$-bit encoding.
  • Otherwise, write $z = 2^{n-1}-t + 2\delta+\epsilon$, where $\delta \in \{0,\ldots,t-1\}$ and $\epsilon \in \{0,1\}$. Encode $z$ at the $(n-1)$-bit encoding of $2^{n-1}-t+\delta$ followed by $\epsilon$.

Here is an example. Let $k = 11 = 2^3+3$. The encoding is as follows:

  • $0 \to 000$.
  • $1 \to 001$.
  • $2 \to 010$.
  • $3 \to 011$.
  • $4 \to 100$.
  • $5 \to 1010$.
  • $6 \to 1011$.
  • $7 \to 1100$.
  • $8 \to 1101$.
  • $9 \to 1110$.
  • $10 \to 1111$.
Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top