Separate all leaves of a weighted tree with minimum weight cuts

https://cs.stackexchange.com/questions/4898

16-10-2019
|

Pergunta

This is part of a larger problem, which I believe I have reduced to this. Given a tree $T$ having positive edge weights, and $k$ leaves (nodes which have exactly one connected node), I need to delete some edges in the tree so that no two leaves in the original tree are connected (by a path) in the newly formed forest (of trees). The total sum of the weights of the deleted edges needs to be minimized.

My understanding is that atleast $k-1$ edges need to be deleted to separate out all the $k$ leaves. Any more deletions will unnecessarily increase the total cost. Thus, we need to perform exactly $k-1$ deletions.

My hypothesis: For every pair of leaf nodes $l_i$ and $l_j$, find the edge with the minimum weight in the (unique) path from $l_i$ to $l_j$. The $k-1$ least weight edges from this set of edges need to be deleted. This will minimize the sum of weights of the edges to be deleted in order to disconnect all leaves from each other.

I am unable to prove or disprove this hypothesis. Can someone please prove the correctness of this hypothesis, or give a counter-example along with the correct algorithm to solve this problem? If this is indeed correct, is there a faster way (asymptotic complexity wise) to solve this problem? This approach will take $\Theta({k^2})$ time. Thanks in advance!

Solução

For each node $v$ with height 1 (all its children are leaves): If $v$ has degree $d$ and $d - 1$ children, then the $d-2$ smallest edges are added to the cut. These cuts are required to separate the children of $v$ from each other. Now, all but one child (leaf) of $v$ has been separated from the rest of the tree.

Now consider all the nodes $v$ with height 2. Each of these nodes with degree $d$ has a single parent edge (assuming that $v$ is not the root), and $d - 1$ paths of length at most two connecting it to the leaves in its subtree. We must remove the minimum edge from exactly $d - 2$ of these paths in order to disconnect these leaves from each other. Add these edges to the cut. Now all but one leaf in $v$'s subtree has been separated from the rest of the tree. This leaf is connected to the parent of $v$ and the rest of the tree by a path of length at most $3$. Later on, we will be required to remove at most one edge on this path, so we don't need to track all edges, just the one of minimum weight.

Now consider all nodes $v$ with height $h$. Each of these nodes with $d - 1$ children has exactly $d - 1$ leaves connected in its subtree. Each of these leaves are connected to $v$ by a path of length at most $h$, but have also stored the minimum weight edge on this path. We must cut the smallest $d - 2$ edges in order to separate these leaves from each other. The remaining leaf is connected to the rest of the tree via a path of length at most $h + 1$, and in constant time we store its new minimum weight edge by comparing the old minimum with the weight of the edge (v, v.parent).

By working through the nodes in height order, we maintain the invariants that:

In each subtree, we have selected the minimum cut to separate the leaves in that subtree.
Each subtree has exactly 1 leaf connected to the root of the subtree.
Each subtree maintains the minimum weight edge on the path to that leaf.
Each subtree with $k$ leaves has selected exactly $k-1$ edges for its cut.
Each subtree is processed in $O(d)$ time, where $d$ is the degree of the root of the subtree.

Correctness and a running time of $O(n)$ follow from these invariants.

Outras dicas

Consider this graph with $2k$ nodes:

enter image description here
^[source]

From how you describe how you choose the edges to be removed, it is not clear what should happen. If for all pairs $(1,k), (2,k), \ldots, (k-1,k)$ the same node is chosen, you are lucky. But you might also end up removing all the edges with weight $1$, which would leave $k-1$ of the leaves connected. You see that you have to proceed iteratively, choosing the next edge to be removed knowing which are already gone.

Note that the removal of any edge causes a new connected component to arise; thus, removing any $k-1$ edges creates $k$ connected components. So we can just sort edges by weight and remove the $k-1$ smallest -- all we have to take care of is to not remove an edge that creates a new component without any of the original leaves.

To this end, collect in a preprocessing step for each edge all leaves reachable from one and all leaves reachable from the other incident node (without using the considered edge). This takes time $O(m^2)$ and $\Theta(mk)$ extra space, if $m$ is the number of edges (and $n$ the number of nodes), we find reachable leaves by depth-first search and we store the set of reachable leaves per edge in a bit vector.

Now sort the set of edges in time $\Theta(m\log m)$ and perform the greedy algorithm. After removing an edge $(u,v)$, update the bit vectors of all edges in the separated components accordingly: all edges in component $U \ni u$ can no longer reach those leaves reachable from $v$ and vice versa. This update takes time $O(mk)$. The greedy algorithm can not remove $(u,v)$ if $u$ or $v$ can not reach a leaf (without using $(u,v)$).

This algorithm runs in time $O(m^2k)$ (at most $m$ phases of the greedy algorithm each of time $O(mk)$, which dominates everything else). A more precise analysis might make use of the fact that components shrink quickly (on average). Also, the preprocessing step can be sped up by using message passing: the sets of reachable leaves can be propagated among inner edges/nodes.

It is clear that the algorithm separates the $k$ leaves by removing $k-1$ edges. Assume that it did not find an optimal selection of edges. Then, there is an edge $e^*$ separating two sets of leaves $L_1, L_2$ which has smaller weight than one of the edges $e$ on the paths between leaves from $L_1$ and $L_2$ chosen by the algorithm. This is a contradiction to how the algorithm works: it would have considered $e^*$ before $e$ because of its smaller weight, and since it separates a superset of the same leaves, could and would have removed it. Thus, the algorithm is correct.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange