سؤال

I have generated an string using the following alphabet. {A,C,G,T}. And my string contains more than 10000 characters. I'm searching the following patterns in it.

  • ATGGA
  • TGGAC
  • CCGT

I have asked to use a string matching algorithm which has O(m+n) running time.

m = pattern length
n = text length

Both KMP and Rabin-Karp algorithms have this running time. What is the most suitable algorithm (between Rabin-Carp and KMP) in this situation?

هل كانت مفيدة؟

المحلول

When you want to search for multiple patterns, typically the correct choice is to use Aho-Corasick, which is somewhat a generalization of KMP. Now in your case you are only searching for 3 patterns so it may be the case that KMP is not that much slower(at most three times), but this is the general approach.

Rabin-Karp is easier to implement if we assume that a collision will never happen, but if the problem you have is a typical string searching KMP will be more stable no matter what input you have. However, Rabin-Karp has many other applications, where KMP is not an option.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top