Pergunta

Let $S[1..n] \in \Sigma^*$ be a string of $n$ symbols over the alphabet $\Sigma$ where $|\Sigma| \in \mathcal{O}(1)$. Determine a shortest string which cannot be obtained from $S$ by deleting some (possibly no) symbols.

For instance, we are given $\Sigma = \{ A, C, G, T \}$ and $S = ACGTACGT$. In this case, one valid answer is $CCA$, because it does not occur as a subsequence in $S$ (i.e. cannot be obtained by removing some of the symbols from $S$) and there is no such string of length $2$.

I am looking for an algorithm which takes $\mathcal{O}(n)$ or $\mathcal{O}(n \log n)$ time. It is not sufficient to compute the length of the answer, an example should also be constructed.

Observations and Approaches. The length of the answer does not exceed $\displaystyle \frac{n}{|\Sigma|} + 1$ by pigeonhole principle. This upper bound is attained for $S = ACGTACGT...ACGT$ in the above example. If we are only interested in the length of the answer, the following approach should work. Let $f(i)$ denote the largest $k \in \mathbb{N}$ such that every string of length $k$ is a subsequence of the prefix $S[1..i]$. Our final answer is $f(n) + 1$ and $f$ can be computed by dynamic programming in $\mathcal{O}(n)$ time for $1 \le i \le n$ simply by recurrence

$\displaystyle f(i) = 1 + f\left(\min{(\ell(i, \sigma_1), \ell(i, \sigma_2), \dots, \ell(i, \sigma_{|\Sigma|})) - 1}\right)$

where $\ell(i, \sigma)$ denotes the last occurrence of $\sigma \in \Sigma$ in the prefix $S[1..i]$. Obviously, the function $f$ is increasing, i.e. $f(i + 1) \ge f(i)$ for $1 \le i \le n - 1$. It seems more difficult to construct an example from this information, that is where I am stuck.

Finally note that the problem is a lot easier, if we require the subsequence do be contigous. Since there are only $\mathcal{O}(n^2)$ substrings, but $\mathcal{O}(|\Sigma|^k)$ strings of length $k$ over $\Sigma$, $k$ turns out to be very small and even brute-force will succeed.

Any kind of help is appreciated, thanks in advance.

Original Source: CSES Problem Set (https://cses.fi/problemset/task/1087/).

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top