String search algorithm that returns recursive matches - Java
-
17-04-2021 - |
Question
The Rabin-Karp search algorithm is working fine but can anyone help to guide me in modifying it to a recursive search? http://algs4.cs.princeton.edu/53substring/RabinKarp.java.html . For example:
* **pattern:** rar
* **text:** abacadabrararbracabrarararacadabrabrarbracad
* **match1:** rar
* **match2:** rar
* **match3:** rar
* **match4:** rar
* **match5:** rar
* **match5:** rar
Are there other faster algorithm for recursive text matching searches?
SOLUTION
Add external library from http://johannburkard.de/software/stringsearch/ to build path. The code below will return all the starting position of the matches. inclusive of embedded ones like match1 and match2.
import com.eaio.stringsearch.BNDM;
String pattern = "rar";
String text = "abacadabrararbracabrarararacadabrabrarbracad";
// Loop through text to get starting position of matched pattern.
List<Integer> matchPoint =new ArrayList<Integer>();
int slice = -1;
while (slice<text.length()){
slice+=1;
com.eaio.stringsearch.BNDM result = new BNDM();
int pos = result.searchString(text, slice, pattern);
if (pos != -1) {
slice = pos;
matchPoint.add(pos);
}
}
Solution
Of course there is. I will not recommend using Rabin-Karp in case of searching a small pattern in string. KMP i.e Knuth-Morris-Pratt algorithm takes linear time and linear additional memory and can return all the matches without the case of collisions which are nag when dealing with Rabin-Karp. Please read the wiki for it. This algorithm is a bit harder to understand, but shorter to code and once you get it right you feel very satisfied.
OTHER TIPS
For longer patterns, the Boyer-Moore algorithm or variants like Horspool's algorithm are generally faster. The Boyer-Moore algorithm isn't particularly well suited for large alphabets. If the text can be the full Unicode range, it would use a rather large shift table, but if the text is ASCII or latin1, the extra space for the lookup tables is small. For large alphabets, I also recommend KMP.