Connection between KMP prefix function and string matching automaton

https://cs.stackexchange.com/questions/1669

16-10-2019
|

Pergunta

Let $A_P = (Q,\Sigma,\delta,0,\{m\})$ the string matching automaton for pattern $P \in \Sigma^m$, that is

$Q = \{0,1,\dots,m\}$
$\delta(q,a) = \sigma_P(P_{0,q}\cdot a)$ for all $q\in Q$ and $a\in \Sigma$

with $\sigma_P(w)$ the length of the longest prefix of $P$ that is a Suffix of $w$, that is

$\qquad \displaystyle \sigma_P(w) = \max \left\{k \in \mathbb{N}_0 \mid P_{0,k} \sqsupset w \right\}$.

Now, let $\pi$ the prefix function from the Knuth-Morris-Pratt algorithm, that is

$\qquad \displaystyle \pi_P(q)= \max \{k \mid k < q \wedge P_{0,k} \sqsupset P_{0,q}\}$.

As it turns out, one can use $\pi_P$ to compute $\delta$ quickly; the central observation is:

Assume above notions and $a \in \Sigma$. For $q \in \{0,\dots,m\}$ with $q = m$ or $P_{q+1} \neq a$, it holds that

$\qquad \displaystyle \delta(q,a) = \delta(\pi_P(q),a)$

But how can I prove this?

For reference, this is how you compute $\pi_P$:

m ← length[P ]
π[0] ← 0
k ← 0
for q ← 1 to m − 1 do
  while k > 0 and P [k + 1] =6 P [q] do
    k ← π[k]
    if P [k + 1] = P [q] then
       k ← k + 1
    end if
    π[q] ← k
 end while
end for

return π

Solução

First of all, note that by definition

$\delta(q,a) = \sigma_P(P_{0,q}\cdot a) =: s_1$ and
$\delta(\pi_P(q),a) = \sigma_P(P_{0,\pi_P(q)}\cdot a) =: s_2$.

Let us investigate $s_1$ and $s_2$ in a sketch:

sketch
^[source]

Now assume $s_2 > s_1$; this contradicts the maximal choice of $s_1$ directly. If we assume $s_1 > s_2$ we contradict the fact that both $s_2$ and $\pi_P(q)$ are chosen maximally, in particular because $\pi_P(q) \geq s_1 - 1$. As both cases cases lead to contradictions $s_1=s_2$ holds, q.e.d.

As requested, a more elaborate version of the proof:

Now we have to show $s_1=s_2$; we do this by showing that the opposite leads to contradictions.

Assume $s_2 > s_1$. Note that $P_{0,s_2} \sqsupset P_{0,q}\cdot a$ because $P_{0,s_2} \sqsupset P_{0,\pi_P(q)}\cdot a$ and $P_{0,\pi_P(q)} \sqsupset P_{0,q}$ by definition of $s_2$. Therefore, $P_{0,s_2}$ -- a prefix of $P$ and a suffix of $P_{0,q}\cdot a$ -- is longer than $P_{0,s_1}$, which is by definition of $s_1$ the longest prefix of $P$ that is a suffix of $P_{0,q}\cdot a$. This is a contradiction.

Before we continue with the other case, let us see that $\pi_P(q) \geq s_1 - 1$. Observe that because $P_{0,s_1} \sqsupset P_{0,q}\cdot a$, we have $P_{0,s-1} \sqsupset P_{0,q}$. Assuming that $\pi_P(q) < s_1 - 1$ immediately contradicts the maximal choice of $\pi_P(q)$ ($s_1 - 1$ is in the set $\pi_P(q)$ is chosen from).

Assume $s_1 > s_2$. We have just shown $|P_{0,\pi_P(q)}\cdot a| \geq s_1$, and remember that $P_{0,\pi_P(q)}\cdot a \sqsupset P_{0,q} \cdot a$. Therefore, $s_1 > s_2$ contradicts the maximal choice of $s_2$ ($s_1$ is in the set $s_2$ is chosen from).

As neither $s_1 > s_2$ nor $s_2 > s_1$ can hold, we have proven that $s_1 = s_2$, q.e.d.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange