Find the smallest set of strings which “covers” a given set of strings (coverage = containing as substring)

cs.stackexchange https://cs.stackexchange.com/questions/105093

Pergunta

Let $S$ be a finite set of strings and $0 < k\leq l$ integers. We want to find the smallest set of strings $T(k,l)$ for which the following holds:

  • $\forall t \in T(k,l): k \leq |t| \leq l$
  • $\forall s \in S \ \exists t \in T(k,l): t \subset s$ (meaning: $s$ is containing $t$ as a contiguous substring).

I would appreciate any help regarding this problem. I see that I could generate the set of all strings with length between $k$ and $l$ then consider the power set but I think there must be a better way to solve this.

I'm not even sure how hard this problem is (I think there may be a reduction for the set cover problem which means it is $NP$-complete but I really don't know) or if there are any good approximation (or maybe exact) algorithms available.

I would like to solve this problem primarily in practice. Typically, $3 \leq k \leq l \leq 10$ holds, $|S| \approx 2000$ and the lengths of the elements in $S$ can vary from 10 to 50 mainly.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
Não afiliado a cs.stackexchange
scroll top