min vertex cover to access k edges in a tree
-
28-09-2020 - |
Question
I need to find the minimum number out of $N$ vertices on a tree with $N-1$ edges, so that at least $K$ edges of that tree are connected to these vertices.
For example, if $N=9$ and $K=6$ and we have this tree:
Edges | Vertice #1 | Vertice #2
1 1 2
2 1 3
3 1 4
4 2 5
5 2 6
6 3 7
7 4 8
8 4 9
The right answer should be $\mathrm{min}=2$.
Any thoughts?
Solution
There is a $\mathcal{O}(nk)$ DP approach.
Call an edge covered if we select a vertex next to it. Root the tree at an arbitrary vertex $r$. Define $DP[i][b][t]$ as the maximum number of edges in the subtree of node $i$ that can be covered by selecting at most $t$ nodes from the subtree. If $b = 0$ we are not allowed to select node $i$, and if $b = 1$ we must select it.
If we calculate this DP, we can solve the problem, as the minimum number of nodes to cover $k$ edges is the smallest $t$ for which $max(DP[r][0][t], DP[r][1][t]) \geq k$. Further note that it suffices to only calculate the $DP$ for $t \leq k$, as any $k < n$ nodes cover at least $k$ edges.
To give the recurrence to calculate the DP, we first give the knapsack-function: let $K(V_{1}, \dots, V_{m})$ be an array such that
\begin{equation*} K(V_{1}, \dots, V_{m})[t] = \max_{t_{1} + \dots + t_{m} = t} \sum_{j = 1}^{m} V_{j}[t_{j}] \end{equation*}
Note that $K(K(V_{1}, \dots, V_{m-1}), V_{m}) = K(V_{1}, \dots, V_{m})$, and that $K(A, B)$ can be directly calculated by the above formula in $\mathcal{O}(|A| \cdot |B|)$ time. Hence calculating $K(V_{1}, \dots, V_{m})$ takes $\mathcal{O}(\sum_{i = 2}^{m} |V_{i}| \sum_{j = 1}^{i-1} |V_{j}|)$ time regardless of the order we combine the sets in. If we are interested in only the first $k$ values of the DP, the complexity drops to $\mathcal{O}(\sum_{i = 2}^{m} |V_{i}| \min(k, \sum_{j = 1}^{i-1} |V_{j}|))$
Let $C_{i}$ be the set of children of node $i$, and $C_{ij}$ be the $j$th child of $i$. Then \begin{gather*} DP[i][0][t] = K(V_{1}, \dots, V_{|C_{i}|})[t]\\ DP[i][1][t] = |C_{i}| + K(V'_{1}, \dots, V'_{|C_{i}|})[t-1] \end{gather*} Where \begin{gather*} V_{j}[t] = \max(DP[C_{ij}][0][t], DP[C_{ij}][1][t] + 1)\\ V'_{j}[t] = \max(DP[C_{ij}][0][t], DP[C_{ij}][1][t])\\ \end{gather*} Calculating the answer with this recursion takes $\mathcal{O}(nk)$ time. Informally this is because over the course of the algorithm, we combine single-element DPs into a DP representing the whole tree. We do at most $\frac{n}{k}$ combinations of sets of size $k$, and any element costs us at most $2k$ time (if element $x \in A$ costs us $|B|$ time when calculating $K(A, B)$) before it gets merged into a set of size $k$, so the total amount of work is at most $\mathcal{O}(k^{2} \frac{n}{k} + k n) = \mathcal{O}(nk)$. This is easy but tedious to formalise with induction.
#include <iostream>
#include <vector>
#include <tuple>
using namespace std;
const int INF = (int)1e9 + 7;
vector<int> knapsack(const vector<int> a, const vector<int> b, int k) {
int n = a.size();
int m = b.size();
vector<int> c(n+m-1, -INF);
for (int i = 0; i < n; ++i) {
for (int j = 0; j < m; ++j) c[i+j] = max(c[i+j], a[i] + b[j]);
}
if (c.size() > k) c.resize(k);
return c;
}
pair<vector<int>, vector<int>> dfs(int i, int p, int k, const vector<vector<int>>& g) {
vector<int> dp0 = {0, 0};
vector<int> dp1 = {-INF, (int)g[i].size() - (p != -1)};
for (auto t : g[i]) {
if (t == p) continue;
vector<int> dp0_t, dp1_t;
tie(dp0_t, dp1_t) = dfs(t, i, k, g);
int m = dp0_t.size();
vector<int> off0(m), off1(m);
for (int j = 0; j < m; ++j) off0[j] = max(dp0_t[j], dp1_t[j] + 1);
for (int j = 0; j < m; ++j) off1[j] = max(dp0_t[j], dp1_t[j]);
dp0 = knapsack(dp0, off0, k+1);
dp1 = knapsack(dp1, off1, k+1);
}
return {dp0, dp1};
}
int minCover(int k, const vector<vector<int>>& g) {
vector<int> dp0, dp1;
tie(dp0, dp1) = dfs(0, -1, k, g);
for (int i = 0;; ++i) {
if (max(dp0[i], dp1[i]) >= k) return i;
}
}
int main() {
ios_base::sync_with_stdio(false);
cin.tie(0);
int n, k;
cin >> n >> k;
vector<vector<int>> g(n);
for (int i = 0; i < n-1; ++i) {
int a, b;
cin >> a >> b;
--a; --b;
g[a].push_back(b);
g[b].push_back(a);
}
int t = minCover(k, g);
cout << t << '\n';
}
OTHER TIPS
A simple solution is to use the state $dp(n,2,n)$. Let $dp(i,0,j)$ be the maximum number of edges we can get by using $\leq j$ nodes in the subtree rooted at node $i$, with node $i$ itself not being in the vertex cover. Let $dp(i,1,j)$ be the same, except node $i$ is included in the vertex cover.
The transition itself is not obvious, but it can be done using a Knapsack-like method. Consider all the children of node $i$. Use all the values of $dp(ch,0,c)$ and $dp(ch,1,c)$ as items in two separate Knapsacks: one to calculate the full array $dp(i,0)$ and one to calculate the full array $dp(i,1)$. The costs of items are uniformly $c$, while the values are the following:
If calculating $dp(i,0)$: value of $dp(ch,0,c)$ is $dp(ch,0,c)$; value of $dp(ch,1,c)$ is $dp(ch,1,c)+1$.
If calculating $dp(i,1)$: value of $dp(ch,0,c)$ is $dp(ch,0,c)+1$; value of $dp(ch,1,c)$ is $dp(ch,1,c)+1$.
We can get the full arrays $dp(i,0)$ and $dp(i,1)$ directly from the end values of the Knapsacks (that is, the values of $kn(last,j)$ for all $j$). The Knapsack has $O(\#children * n)$ items per node, and runs in $O(\#children * n * n)$ per node. Therefore, the total complexity of the solution is $O(n^3)$. Please note that you will have to slightly modify the traditional 0-1 Knapsack to prevent two items which represent the same node from being taken; this is not very difficult. Also, when calculating the $dp(i,1)$ array, please note that the node $i$ itself is an extra node on the vertex cover.
I am note sure if there are solutions running in time faster than this one, but I would not doubt it.