Marginal Probability of Generating a Tree

https://cs.stackexchange.com/questions/126652

29-09-2020
|

Question

Fix some finite graph $G = (V, E)$, and some vertex $x$.

Suppose I generate a random sub-tree of $G$ of size $N$, containing $x$, as follows:

Let $T_0 = \{ x \}$.
For $0 < n \leqslant N$

i. Let $B_n$ be the set of neighbors of $T_{n-1}$ outside of $T_{n-1}$.

ii. Form $T_n$ by
- Sample a pair $(x_n, y_n) \in E(G) \cap \left( V (T_{n-1}) \times B_n \right)$, with probability $q_n (x_n, y_n | T_{n-1} )$,
- Add $y_n$ to $V(T_{n-1})$, and add $(x_n, y_n)$ to $E (T_{n-1})$.
Return $T_N$.

Suppose also that $q_n ( x_n, y_n | T_{n-1} )$ can be computed easily for all $(T_{n-1}, x_n, y_n)$. I am interested in efficiently and exactly calculating the marginal probability of generating the tree $T_N$, given that I began growing it at $T_0 = \{ x \}$, i.e.

$$P(T_N | T_0 = \{ x \}) = \sum_{x_{1:N}, y_{1:N}} \prod_{n = 1}^N q_n (x_n, y_n | T_{n-1} ).$$

My question is essentially whether I should expect to be able to find an efficient (i.e. polynomial-time) algorithm for this, and if so, what it might be.

Some thoughts:

Naively, the sum has exponentially-many terms, which precludes trying to evaluate the sum directly.
On the other hand, this problem is also highly-structured (trees, recursion, etc.), which might suggest that some sort of dynamic programming approach would be feasible. I'm not sure of exactly how to approach this.
Relatedly, I know how to calculate unbiased, non-negative estimators of $P(T_N | T_0 = \{ x \})$, which have reasonable variance properties, by using techniques from Sequential Monte Carlo / particle filtering. This suggests that the problem is at least possible to approximate well in a reasonable amount of time.

Solution

No. If $q(x_n,y_n|T_{n-1})$ is arbitrary -- there can be an arbitrary dependence on $T_{n-1}$ -- then this requires exponential time.

Consider a tree $T_N$ that has a single root node, $N-1$ leaves, and an edge from the root to each leaf. There are $2^N$ subtrees of $T_N$, and in particular, there are $2^N$ possible values of $T_n$ that can occur in the expression

$$\sum_{x, y} \prod_{n = 1}^N q_n (x_n, y_n | T_{n-1} ).$$

One can use a simple adversary argument to prove that evaluating this expression requires exponential time. Suppose that we evaluate $q_n(x_n,x_n|T_{n-1})$ by querying an oracle with $x_n,y_n,T_{n-1}$. Suppose there is a single tree $T$ that is never queried to the oracle as any $T_{n-1}$. Choose all of the $q_n(\cdots)$ values to be strictly positive. Then since $q_n(x_n,y_n|T)$ wasn't queried during execution, we can choose it after observing the output of the algorithm; but by varying it, we can choose a value that makes the algorithm's output wrong (in particular, the value of the expression depends on $q_n(x_n,y_n|T)$ but the algorithm's output doesn't depend on $q_n(x_n,y_n|T)$, so the algorithm's output cannot correct). We have proven that, to produce the correct output, any correct algorithm must query the oracle for all $2^N$ possible subtrees of $T_N$. It takes at least $O(1)$ time to query an oracle.

In conclusion, this argument proves that any correct algorithm for computing this expression must take $\Omega(2^N)$ time.

I don't know whether it can always be done in $O(2^N)$ time, or whether perhaps $O(N!)$ time might be required.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange