Proving correctness and optimality of a greedy algorithm

https://cs.stackexchange.com/questions/121765

29-09-2020
|

Question

Here is a (slightly abridged) problem from Kleinberg and Tardos:

Consider a complete balanced binary tree with $n$ leaves where $n$ is a power of two. Each edge $e$ of the tree has an associated length $\ell_e$, which is a positive number. The distance from the root to a given leaf is the sum of the lengths of all the edges on the path from the root to the leaf.

Now, if all leaves do not have the same distance from the root, then the signals we send starting from the root will not reach the leaves at the same time, and this is a big problem. We want the leaves to be completely synchronized, and all to receive the signal at the same time. To make this happen, we will have to increase the lengths of certain edges, so that all root-to-leaf paths have the same length (we’re not able to shrink edge lengths). If we achieve this, then we say the tree has zero skew.

Give an algorithm that increases the lengths of certain edges so that the resulting tree has zero skew and the total edge length is as small as possible.

This problem was in the "Greedy Algorithms" chapter.

I know the solution to this problem is as follows:

Let the subtrees below the root be $L$ and $R$. If the maximum sum of the edge lengths on the path to a leaf in $L$ starting from the root is greater than the maximum sum of the edge lengths on a path to a leaf in $R$, then increase the edge length from the root to the right subtree by the positive difference between the two.

Conversely, if the maximum sum of the edge lengths on a path to a leaf in $R$ starting from the root is greater than the maximum sum of the edge lengths on a path to a leaf in $L$, then increase the edge length from the root to the left subtree by the positive difference between the two.

I am having trouble proving the following:

$1)$ The resulting tree upon termination of this algorithm is a zero-skew tree (the length from the root to any leaf is the same).
$2)$ The sum of the edge lengths is minimized with this algorithm.

I have tried to prove these facts in many ways, such as induction and direct proof, but I am having a lot of difficulty doing so. The algorithm makes sense to me intuitively, but I'm having trouble explaining why it works formally.

I would greatly appreciate some help in solving this problem.

Solution

Notation. Given a binary tree $T$ and vertex $v$ of $T$, let $\ell_v$ and $r_v$ be the left and right children of vertex $v$ in $T$, respectively. Let $T_v$ be subtree of $T$ rooted at $v$. Let $d_T(v)$ be the weighted distance between the root of $T$ and $v$ and $h(T) = \max_{v \in T} d_T(v)$. If $(u,v)$ is an edge of $T$, let $w_T(u,v)$ be its weight. With a slight abuse of notation, define $w(T) = \sum_{(u,v) \in T} w(u,v)$. Finally, we define the skew of $v$ as $s_T(v) = w_T(v, \ell_v) + h(T_{\ell_v}) - w_T(v, r_v) - h(T_{r_v})$ if $v$ is an interval vertex of $T$ and $s_T(v)=0$ if $v$ is a leaf of $T$.

1) The resulting tree upon termination of this algorithm is a zero-skew tree (the length from the root to any leaf is the same).

We will show by induction on $i=0,1,\dots$ that, for all trees $T$ with $2^{i+1} - 1$ vertices, the algorithm returns a tree in which all vertices have skew $0$.

The base case $i=0$ is trivially true since $T$ must have exactly one vertex and the algorithm does nothing.

Suppose now that $i>0$ and let $v$ be the root of $T$. The trees $T_{\ell_v}$ and $T_{r_v}$ have exactly $\frac{(2^{i+1} - 1) - 1}{2} = 2^i - 1$ vertices and, by inductive hypothesis, the recursive calls on $T_{\ell_v}$ and $T_{r_v}$ ensure that all vertices, except possibly $v$, have skew $0$. The reweighing of $(v, \ell_v)$ and $(v, r_v)$ only affects the skew of $v$ and ensures that $s(v)=0$.

The sum of the edge lengths is minimized with this algorithm.

Let $T'$ be the tree returned by the algorithm. Suppose that an optimal zero-skew tree $T^*$ obtained from the input tree $T$ by lengthening edges is such that $w(T^*) < w(T')$.

Let $(u,v)$ be an edge of $T$ (where $u$ is the parent of $v$) such that $w_{T^*}(u,v) < w_{T'}(u,v)$ and the depth of $u$ is maximized.

Since $s_{T^*}(u)=0$ and since $h(T_v^*) \ge h(T_v) = h(T'_v)$, the sum of the weights in $T^*$ on each path from $v$ to a leaf in $T_v$ must have increased by at least $w_{T'}(u,v) - w_{T^*}(u,v)$. Consider the subgraph of $T^*_v$ induced by the edges whose weight has not been increased w.r.t. $T$ and let $C$ be the connected component containing $v$. Let $E$ be the set of edges with one endpoint in $C$ and the other in $T^*_v \setminus C$. Let $\Delta = \min_{(x,y) \in E} \{ w_{T^*}(x,y) - w_T(x,y) \}$.

Consider a new tree $T''$ that is obtained from $T'$ by reducing the weight of all edges in $E$ by $\Delta$ and increasing the weight of $(u,v)$ by $\Delta$. The tree $T''$ is a feasible solution to your problem and weighs $w(T^*) - |E| \Delta + \Delta < w(T^*)$ (since $|E|\ge 2$). This is a contradiction.

OTHER TIPS

The proof of the first part is quite straightforward if you strengthen the induction proposal.

Statement:

The algorithm applied to the tree with root $v$ (we denote it as $f(v)$) yields a zero-skew tree which has the same height as $v$.

Proof:

By structural induction: if $v$ has no children, it is trivially true. Otherwise, it has children $v_l, v_r$ and edges of length $w_l, w_r$ respectivetely. Without loss of generality, we assume $h(v_l) + w_l \geq h(v_r) + w_r$. Therefore, $v$ has height $h(v_l) + w_l$. By induction proposal, $f(v_l)$ and $f(v_r)$ are zero-skew with height $h(v_l)$ and $h(v_r)$ respectivetely. Thus, every path from $f(v)$ to its leaf from left subtree has length $$w_l + h(v_l)$$ and every path from $f(v)$ to its leaf from the right subtree has length $$w_r + ((h(v_l) + w_l) - (h(v_r) + w_r)) + h(v_r) = w_l + h(v_l)$$ i.e. $f(v)$ is zero-skew with height $h(v_l) + w_l$.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange