سؤال

I have a question about choosing the q and d in Rabin-Karp algorithm for searching strings. The algorithm uses the values q as modulus and d as hash function.

If I choose q as number of power 2 and d=q-1 or d=q+1

  • How can these choices affect spurious hits?

  • What patterns will be sure spurious hits in case d=q+1 and in case d=q-1?

هل كانت مفيدة؟

المحلول

X mod (2^n +1) can be described as alternating sum / difference e.g.

X= 0x11223344, n=8, X mod 257 == 0x44 - 0x33 + 0x22 - 0x11 mod 257. The problems here is that any repeating letter will cancel itself -- not very practical -- of course n doesn't have to be 8.

X mod (2^n -1) will be the sum of all n bit patterns, e.g.

X= 0x11223344, n=8, X mod 255 ==( 0x44 + 0x33 +0x22 +0x11 ) mod 255

The problem here is that toggling bytes cancel themselves: Hash('ab') = Hash('ba').

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top