Question

I want to implement boyer-moore algorithm but I'm stuck on constructing a good suffix table which I think should have O(n) complexity, I only found the O(n^2) algorithm. So do you guys have a clue for me?

Please don't give me code snippets, I can google it if I want, but I prefer to solve it in my way, I just need a clue.

Was it helpful?

Solution

There is a fast algorithm that uses the prefix function. A prefix function of a string s is an array p, where p[i] is the longest length of the prefix of a substring s[0..i] (0-indexed) and its suffix. It can be calculated with O(n) complexity using KMP that uses 2 facts:

  1. p[i+1]<=p[i]+1.
  2. For each i, if s[p[i]]==s[i+1], then p[i+1] = p[i] + 1. Otherwise, we should try another string, for which s[0...j-1]==s[i-j+1...i]. Obviously, (we choose the longest string) we should just jump to the position i = p[i-1].

The algorithm (c++):

vector<int> prefix (string s) 
{
    int n=s.length();
    vector<int> pi(n);
    pi[0]=0;
    for (int i=1; i<n; ++i) 
    {
        int j = pi[i-1];
        while (j>0 && s[i]!=s[j])
            j=pi[j-1];
        if (s[i]==s[j])  ++j;
        pi[i]=j;
    }
    return pi;
}

Now we can construct the suffix table:

m = text.length();
vector<int> suffshift(m);
vector<int> pi = prefix(pattern);
vector<int> pi1 = prefix(inverse(pattern));
for (int j=0; j<m; ++j)
{
    suffshift[j] = m - pi[m];
}
for (int i=1; i<m; ++i)
{
     j = m - pi1[i];
     suffshift[j]=min(suffshift[j], i-pi1[i]);
}

Suffshift[m] stands for the empty suffix, suppshift[0] - for the whole text. THe complexity is O(n).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top