Question

I need to code an algorithm for longest two pattern prefix/suffix match whose time complexity is O(n+m1+m2), where n is the length of the String and m1, m2 are the lengths of pattern1 and pattern2 respectively.

Example: if the String is "OBANTAO" and Pattern1 is "BANANA" and Patten2 is "SIESTA" then the answer is the substring "BANTA" of String that consists of the prefix BAN of BANANA and the suffix TA of SIESTA.

Results from Google are: "Rabin-karp string search algorithm", "Knuth-morris-pratt algorithm" and "Boyer-moore string search algorithm".

I'm able to understand all the above 3 algorithms, but the problem is that, they are all based on "single pattern prefix/suffix matching". I'm unable to extend them for two pattern prefix/suffix match.

A sample algorithm or a link towards searching it would be greatly helpful for me in developing the program.

Was it helpful?

Solution

Knuth--Morris--Pratt can be modified straightforwardly to yield, for each position in the haystack string, the length of the longest prefix of the needle string with a match ending at that position. Use KMP to compute this information for Pat1 in String and reverse(Pat2) in reverse(String), then iterate over each position in String looking for the maximum prefix/suffix length.

Example with String = "OBANTAO" and Pat1 = "BANANA" and Pat2 = "SIESTA":

"BANANA" = Pat1 in String
 O B A N T A O
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
| | | | | | | 0 ("")
| | | | | | 0 ("")
| | | | | 0 ("")
| | | | 3 ("BAN")
| | | 2 ("BA")
| | 1 ("B")
| 0 ("")
0 ("")

"ATSEIS" = reverse(Pat2) in reverse(String)
 O A T N A B O
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
| | | | | | | 0 ("")
| | | | | | 0 ("")
| | | | | 1 ("A")
| | | | 0 ("")
| | | 2 ("AT")
| | 1 ("A")
| 0 ("")
0 ("")

Reverse the second array and sum componentwise.

  0 0 1 2 3 0 0 0
+ 0 0 1 0 2 1 0 0
-----------------
  0 0 2 2 5 1 0 0
          ^
          |
         max (argument = "BAN" ++ reverse("AT"))

OTHER TIPS

I tried to Implement @David Eisenstat solution in Java. It is in O(2n) time and O(2(n+1)) Auxiliary Spaces

String prefix = "BANANA";
    String suffix = "SIESTA";
    String input = "SABANANQS";

    // Prepare Inputs
    char[] prefixArray = prefix.toCharArray();
    char[] suffixArray = suffix.toCharArray();
    char[] inputArray = input.toCharArray();
    int inputLength = inputArray.length;
    int suffixLength = suffixArray.length;
    int prefixLength = prefixArray.length;
    // Auxiliary Spaces O(2(n+1))
    int[] prefixIndexs = new int[inputLength+1];
    int[] suffixIndexs = new int[inputLength+1];

    int m = 0;
    int n = 0;
    // O(1)
    for (int i = 0; i < inputLength; i++) {
        if (inputArray[i] == prefixArray[m]){
            m = m+1;
            prefixIndexs[i+1] = m;
            if (m == prefixLength) {
                m = 0;
            }
        }else{
            m = 0;
        }
        if (inputArray[inputLength-1-i] == suffixArray[suffixLength-1-n]){   // Reverse order or input and reverse oder of suffix
            n = n +1;
            suffixIndexs[i+1] = n;
            if (n == suffixLength) {
                n = 0;
            }
        }else{
            n = 0;
        }
    }

    int currmax =0;
    int mIndex = 0; // prefix Index from start
    int nIndex = 0; // suffix Index from End
    //O(1)  - Do Sum and find the max
    for (int i = 0; i < (inputLength+1); i++) {
        m = prefixIndexs[i];
        n = suffixIndexs[inputLength-i];
        if ( m != 0 && n != 0){  // if both prefix and suffix exists
            if (m+n > currmax){
                currmax = (m+n);
                mIndex = m;
                nIndex = n;
            }
        }
    }

    System.out.println("Input :"+input);
    System.out.println("prefix :"+prefix);
    System.out.println("suffix :"+suffix);
    System.out.println("max :"+currmax);
    System.out.println("mIndex :"+mIndex);
    System.out.println("nIndex :"+nIndex);
    System.out.println(prefix.substring(0,mIndex)+suffix.substring(suffix.length() - nIndex,suffix.length()));

Instead of reset m and n to 0, we can keep an another array for each to implement KMP algorithm , since the input prefix and sufix doesnt has repetitive char sequence i left it

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top