Question

Consider the following problem:

You are given a finite set of numbers $(l_k)_{k\in \{ 1, ..., n \}}$ such that $\sum_{k=1}^n2^{-l_k}<1$. Describe an algorithm to find a set $(l'_k)_{k\in \{ 1, ..., n \}}$ such that $\forall k \in \{ 1, ..., n \}:l'_k\le l_k$ and $\sum_{k=1}^n2^{-l'_k}=1$.

(For what it's worth, this is a problem arising from Information Theory, where the Kraft-McMillan Theorem gives that the result above yields a more efficient binary code than the one with codeword lengths $(l_k)$.)

Here are my initial thoughts. We can consider $\sum_{k=1}^n2^{-l_k}$ as a binary number e.g. $0.11010011$ and then we need to reduce the value of some $l_k$ values whose digit position is preceded by a $0$. So for instance, with the initial $0$ in the $\frac{1}{8}$ position of the example number I gave, we want to decrease the value of some $l_i=4$ to $l'_i=3$ to add $\frac{1}{8}$ and subtract $\frac{1}{16}$ to the sum. We then have $0.11100011$, so we've moved the problematic $0$ along a digit. When we get to the end we presumably have something like $0.11111110$, and then need to reduce the value of the longest codeword by 1 to get the overflow to $1.00000000$.

However, I encounter two problems: there may not be such an $l_i=4$, for instance, if the $1$ in the $\frac{1}{16}$ digit place arises as the sum of three $l_i=5$ numbers. Additionally, if we multiple have $0$ digits in a row then we presumably need to scan until the next $1$ and then decrement a corresponding $l_i$ multiple times, but it's conceivable that I would "run out" of large enough $l_i$ codewords that I can manipulate in this way.

Can anyone describe an algorithm with a simple proof of correctness?

A follow-up problem: how do we generalise this algorithm to bases other than $2$?

Was it helpful?

Solution

Here is a very simple algorithm. We will require the following lemma.

Lemma. Suppose that $1 \leq \ell_1 \leq \cdots \leq \ell_k$ and $\sum_{i=1}^k 2^{-\ell_i} \geq 1/2$. Then there exists $r \in \{1,\ldots,k\}$ such that $\sum_{i=1}^r 2^{-\ell_i} = 1/2$.

Proof. Let $r$ be the first index such that $\sum_{i=1}^r 2^{-\ell_i} \geq 1/2$. Since the $\ell_i$ are non-decreasing, there are integers $A,B$ such that $$ \begin{align} \sum_{i=1}^{r-1} 2^{-\ell_i} &= \frac{A}{2^{\ell_r}}, & \sum_{i=1}^r 2^{-\ell_i} &= \frac{B}{2^{\ell_r}}. \end{align} $$ Moreover, $A < 2^{\ell_r-1}$ and $B \geq 2^{\ell_r-1}$. Since $B-A=1$, we conclude that $B = 2^{\ell_r-1}$, and so $\sum_{i=1}^r 2^{-\ell_i} = 1/2$. $\quad\square$

This suggests the following algorithm. We can assume that your sequence is sorted, that is, we are given a sequence $\ell_1 \leq \cdots \leq \ell_k$ such that $\sum_{i=1}^k 2^{-\ell_i} \leq 1$. We now consider three cases:

  1. $\sum_{i=1}^k 2^{-\ell_i} = 1$. In this case, there is nothing to do.
  2. $\sum_{i=1}^k 2^{-\ell_i} \leq 1/2$. In this case, we can decrease each $\ell_i$ by $1$.
  3. $1/2 \leq \sum_{i=1}^k 2^{-\ell_i} \leq 1$. Applying the lemma, we find $r$ such that $\sum_{i=1}^r 2^{-\ell_i} = 1/2$, and so $\sum_{i=r+1}^k 2^{-\ell_i} \leq 1/2$. We thus have to solve the same kind of problem for the second half $\ell_{r+1},\ldots,\ell_k$, aiming at $1/2$ rather than $1$.

To implement this recursion more cleanly, we add a parameter $s$, and our goal is to correct a sequence satisfying $\sum_i 2^{-\ell_i} \leq 2^{-s}$ to one satisfying $\sum_i 2^{-\ell_i} = 2^{-s}$ by only decreasing elements.


Here is how the algorithm works in the case of the sequence $1,2,4,7,8$, which matches your example. The sum in your case is more than $1/2$, so we separate the sequence into two parts: $1$ and $2,4,7,8$. We only handle the second, aiming at a sum of $1/2$.

The sum in the case of $2,4,7,8$ is more than $1/4$, so we separate the sequence into two parts, $2$ and $4,7,8$, and only handle the second, aiming at a sum of $1/4$.

The sum in the case of $4,7,8$ is less than $1/8$, so we decrement each element, obtaining the sequence $3,6,7$, whose sum is more than $1/8$. We separate it into $3$ and $6,7$, and only handle the second, aiming at a sum of $1/8$.

We decrement $6,7$ twice, obtaining the sequence $4,5$ whose sum exceeds $1/16$. We separate it into $4$ and $5$, and decrement the latter once.

Putting everything together, we obtain the sequence $1,2,3,4,4$.


In the $q$-ary case, we have to change the problem somehow, since the result is not true in general. For example, take $q = 3$ and consider the sequence $1, 1$.


Here is another simple algorithm, based on the following lemma.

Lemma. Suppose that $0 \leq \ell_1 \leq \cdots \leq \ell_k$ and $\sum_{i=1}^k 2^{-\ell_i} < 1$. Then $\sum_{i=1}^{k-1} 2^{-\ell_i} + 2^{-(\ell_k-1)} \leq 1$.

Proof. Since the $\ell_i$ are nondecreasing, we can write $\sum_{i=1}^k 2^{-\ell_i} = A/2^{\ell_k}$, where $A < 2^{\ell_k}$. Replacing $\ell_k$ with $\ell_k-1$ increases the sum by $1/2^{\ell_k}$. Since $A+1 \leq 2^{\ell_k}$, the sum remains at most $1$. $\quad\square$

This suggests the following simple algorithm: repeatedly decrement the largest $\ell_i$. The algorithm necessarily terminates since at most $\sum_i \ell_i$ iterations can take place.

Let us apply this on our example above: $$ \begin{align} &1,2,4,7,8 \to 1,2,4,7,7 \to 1,2,4,6,7 \to 1,2,4,6,6 \to 1,2,4,5,6 \to \\ &1,2,4,5,5 \to 1,2,4,4,5 \to 1,2,4,4,4 \to 1,2,3,4,4 \end{align} $$ This algorithm can be implemented easily using a heap. However, it is slower (in general) than the preceding algorithm, if the latter is implemented correctly. For example, the sequence $\ell$ takes $\ell$ steps in this algorithm, but could be handled by a single iteration of the preceding algorithm.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top