min vertex cover to access k edges in a tree

https://cs.stackexchange.com/questions/119114

28-09-2020
|

Question

I need to find the minimum number out of $N$ vertices on a tree with $N-1$ edges, so that at least $K$ edges of that tree are connected to these vertices.

For example, if $N=9$ and $K=6$ and we have this tree:

   Edges  |  Vertice #1  | Vertice #2
     1            1          2
     2            1          3
     3            1          4
     4            2          5
     5            2          6
     6            3          7
     7            4          8
     8            4          9

The right answer should be $\mathrm{min}=2$.

Any thoughts?

Solution

There is a $\mathcal{O}(nk)$ DP approach.

Call an edge covered if we select a vertex next to it. Root the tree at an arbitrary vertex $r$. Define $DP[i][b][t]$ as the maximum number of edges in the subtree of node $i$ that can be covered by selecting at most $t$ nodes from the subtree. If $b = 0$ we are not allowed to select node $i$, and if $b = 1$ we must select it.

If we calculate this DP, we can solve the problem, as the minimum number of nodes to cover $k$ edges is the smallest $t$ for which $max(DP[r][0][t], DP[r][1][t]) \geq k$. Further note that it suffices to only calculate the $DP$ for $t \leq k$, as any $k < n$ nodes cover at least $k$ edges.

To give the recurrence to calculate the DP, we first give the knapsack-function: let $K(V_{1}, \dots, V_{m})$ be an array such that

\begin{equation*} K(V_{1}, \dots, V_{m})[t] = \max_{t_{1} + \dots + t_{m} = t} \sum_{j = 1}^{m} V_{j}[t_{j}] \end{equation*}

Note that $K(K(V_{1}, \dots, V_{m-1}), V_{m}) = K(V_{1}, \dots, V_{m})$, and that $K(A, B)$ can be directly calculated by the above formula in $\mathcal{O}(|A| \cdot |B|)$ time. Hence calculating $K(V_{1}, \dots, V_{m})$ takes $\mathcal{O}(\sum_{i = 2}^{m} |V_{i}| \sum_{j = 1}^{i-1} |V_{j}|)$ time regardless of the order we combine the sets in. If we are interested in only the first $k$ values of the DP, the complexity drops to $\mathcal{O}(\sum_{i = 2}^{m} |V_{i}| \min(k, \sum_{j = 1}^{i-1} |V_{j}|))$

Let $C_{i}$ be the set of children of node $i$, and $C_{ij}$ be the $j$th child of $i$. Then \begin{gather*} DP[i][0][t] = K(V_{1}, \dots, V_{|C_{i}|})[t]\\ DP[i][1][t] = |C_{i}| + K(V'_{1}, \dots, V'_{|C_{i}|})[t-1] \end{gather*} Where \begin{gather*} V_{j}[t] = \max(DP[C_{ij}][0][t], DP[C_{ij}][1][t] + 1)\\ V'_{j}[t] = \max(DP[C_{ij}][0][t], DP[C_{ij}][1][t])\\ \end{gather*} Calculating the answer with this recursion takes $\mathcal{O}(nk)$ time. Informally this is because over the course of the algorithm, we combine single-element DPs into a DP representing the whole tree. We do at most $\frac{n}{k}$ combinations of sets of size $k$, and any element costs us at most $2k$ time (if element $x \in A$ costs us $|B|$ time when calculating $K(A, B)$) before it gets merged into a set of size $k$, so the total amount of work is at most $\mathcal{O}(k^{2} \frac{n}{k} + k n) = \mathcal{O}(nk)$. This is easy but tedious to formalise with induction.

#include <iostream>
#include <vector>
#include <tuple>
using namespace std;
const int INF = (int)1e9 + 7;

vector<int> knapsack(const vector<int> a, const vector<int> b, int k) {
    int n = a.size();
    int m = b.size();
    vector<int> c(n+m-1, -INF);
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < m; ++j) c[i+j] = max(c[i+j], a[i] + b[j]);
    }
    if (c.size() > k) c.resize(k);
    return c;
}

pair<vector<int>, vector<int>> dfs(int i, int p, int k, const vector<vector<int>>& g) {
    vector<int> dp0 = {0, 0};
    vector<int> dp1 = {-INF, (int)g[i].size() - (p != -1)};
    for (auto t : g[i]) {
        if (t == p) continue;
        vector<int> dp0_t, dp1_t;
        tie(dp0_t, dp1_t) = dfs(t, i, k, g);
        int m = dp0_t.size();

        vector<int> off0(m), off1(m);
        for (int j = 0; j < m; ++j) off0[j] = max(dp0_t[j], dp1_t[j] + 1);
        for (int j = 0; j < m; ++j) off1[j] = max(dp0_t[j], dp1_t[j]);
        dp0 = knapsack(dp0, off0, k+1);
        dp1 = knapsack(dp1, off1, k+1);
    }
    return {dp0, dp1};
}

int minCover(int k, const vector<vector<int>>& g) {
    vector<int> dp0, dp1;
    tie(dp0, dp1) = dfs(0, -1, k, g);
    for (int i = 0;; ++i) {
        if (max(dp0[i], dp1[i]) >= k) return i;
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(0);

    int n, k;
    cin >> n >> k;

    vector<vector<int>> g(n);
    for (int i = 0; i < n-1; ++i) {
        int a, b;
        cin >> a >> b;
        --a; --b;
        g[a].push_back(b);
        g[b].push_back(a);
    }

    int t = minCover(k, g);
    cout << t << '\n';
}

OTHER TIPS

A simple solution is to use the state $dp(n,2,n)$. Let $dp(i,0,j)$ be the maximum number of edges we can get by using $\leq j$ nodes in the subtree rooted at node $i$, with node $i$ itself not being in the vertex cover. Let $dp(i,1,j)$ be the same, except node $i$ is included in the vertex cover.

The transition itself is not obvious, but it can be done using a Knapsack-like method. Consider all the children of node $i$. Use all the values of $dp(ch,0,c)$ and $dp(ch,1,c)$ as items in two separate Knapsacks: one to calculate the full array $dp(i,0)$ and one to calculate the full array $dp(i,1)$. The costs of items are uniformly $c$, while the values are the following:

If calculating $dp(i,0)$: value of $dp(ch,0,c)$ is $dp(ch,0,c)$; value of $dp(ch,1,c)$ is $dp(ch,1,c)+1$.
If calculating $dp(i,1)$: value of $dp(ch,0,c)$ is $dp(ch,0,c)+1$; value of $dp(ch,1,c)$ is $dp(ch,1,c)+1$.

We can get the full arrays $dp(i,0)$ and $dp(i,1)$ directly from the end values of the Knapsacks (that is, the values of $kn(last,j)$ for all $j$). The Knapsack has $O(\#children * n)$ items per node, and runs in $O(\#children * n * n)$ per node. Therefore, the total complexity of the solution is $O(n^3)$. Please note that you will have to slightly modify the traditional 0-1 Knapsack to prevent two items which represent the same node from being taken; this is not very difficult. Also, when calculating the $dp(i,1)$ array, please note that the node $i$ itself is an extra node on the vertex cover.

I am note sure if there are solutions running in time faster than this one, but I would not doubt it.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange