Question

KMP algorithm for string matching. Following is the code I found online for computing the longest prefix-suffix array:
Defination:

lps[i] = the longest proper prefix of pat[0..i] 
              which is also a suffix of pat[0..i]. 

Code:

void computeLPSArray(char *pat, int M, int *lps)
{
    int len = 0;  // length of the previous longest prefix suffix
    int i;

    lps[0] = 0; // lps[0] is always 0
    i = 1;

    // the loop calculates lps[i] for i = 1 to M-1
    while(i < M)
    {
       if(pat[i] == pat[len])
       {
         len++;
         lps[i] = len;
         i++;
       }
       else // (pat[i] != pat[len])
       {
         if( len != 0 )
         {
           // This is tricky. Consider the example AAACAAAA and i = 7.
           len = lps[len-1]; //*****************

           // Also, note that we do not increment i here
         }
         else // if (len == 0)
         {
           lps[i] = 0;
           i++;
         }
       }
    }
}

Can I use len = len-1 instead of len = lps[len-1]?
because len always counts the prefix length like from [0 .. someIndex]. Then why use lps for assignment here? Following are the cases for which I tested which work fine(first line is the pattern and subsequent two lines are the result for original and modified assignment to len ) :

a  a  a  b  a  b  c  
0  1  2  0  1  0  0  
0  1  2  0  1  0  0 

a  b  c  b  a  b  c  
0  0  0  0  1  2  3  
0  0  0  0  1  2  3  

a  a  b  c  b  a  b  
0  1  0  0  0  1  0  
0  1  0  0  0  1  0  

Code here with both variations written : http://ideone.com/qiSrUo

Was it helpful?

Solution

Following a case for which it does not work:

i     0  1  2  3  4  5
p     A  B  A  B  B  A 
c1    0  0  1  2  0  1
c2    0  0  1  2  2  3

The reason being:

At i=4, len=2 
p[i]='B' and p[len]='A' //Mismatch!
lps string upto i=3: AB(0-1 prefix), (2-3 suffix)
-------------------------------
i=4
Next charecter: B
len=2 // longest prefix suffix length 
Charecter looking for : A (=p[len])

So upto i=3 we had AB(0-1) as the prefix that matched with suffix AB(2-3), but now at i=4 there is a mismatch so we see that we can't extend the original prefix(0-1) so the position to be checked is the prefix found prior to "AB" which is done by lps[len-1] < -1 as the array starts from 0 > and this is not necessarily len-1 as we may need to step back further than that to get the new longest prefix suffix.

OTHER TIPS

Here is the best explanation I have ever seen.Example in it will answer your question clearly.

Knuth–Morris–Pratt(KMP) Pattern Matching(Substring search)

Here is MY KMP code:-

#include <bits/stdc++.h>
using namespace std;


int main(void){
    int t;
    scanf("%d",&t);
    while(t--){
        string s;
        cin>>s;
        int n = s.length();
        int arr[n];
        arr[0] = 0;
        int len = 0;
        for(int i = 1;i<n;){
            if(s[len]==s[i]){
                len++;
                arr[i++] = len;
            }
            else{
                if(len!=0){
                    len = arr[len-1];
                }
                else{
                    arr[i] = 0;
                    i++;
                }
            }
        }
        cout<<arr[n-1]<<endl;
    }


    return 0;
}

Time Complexcity is O(N)

If you want to understand the intuition behind this algorithm too, refer to below video on YouTube. This is the clearest explanation I've come across for this algorithm.

The longest prefix suffix array in kmp algorithm

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top