Question

My previous question pertained to the general string search algorithm. I am researching the Rabin-Karp algorithm and I have a function template like:

RabinKarpMatch(char *Text, char *Search_phrase,int radix,int prime)

I wanted to know how the values of the radix and prime will change according to the search_phrase and text? Or should I just give them arbitrary values for all the cases?

Was it helpful?

Solution

In Rabin-Karp algorithm radix and prime don't change during text processing. But choosing good radix and prime numbers has a critical importance. In worst case (almost impossible in practice) when all substrings of the text have the same hash code equal to template hash code, algorithm will work on O(nm) time, where n is text length and m is template length.

General rule: Prime - must be small, and radix - must be convenient to use. I believe pairs like:

(prime, radix)

31, 2^64

37, 2^64

57, 2^64

will be OK for you.

In some implementations to minimize hash collisions more than one pair is used.

OTHER TIPS

import java.util.*;
import java.lang.*;

// Rabin Karp
// find if pattern exists in string or not. If found return its index.

public class RabinKarp {

    private int prime = 101;

    public int patternSearch(String s, String pattern) {

        int lengthOfPattern = pattern.length();
        long hashOfPattern = createHash(pattern, lengthOfPattern);
        long hashOfString = createHash(s, lengthOfPattern);

        for(int i = 0; i < s.length() - lengthOfPattern + 1; i++) {

            if (hashOfPattern == hashOfString && checkEqual(pattern, s.substring(i, i + lengthOfPattern), lengthOfPattern))
                return i;
            if (i != s.length() - lengthOfPattern)
                hashOfString = reCreateHash(s.substring(i+1,i+1+lengthOfPattern), hashOfString, (int)s.charAt(i), lengthOfPattern);
        }
        return -1;
    }

    public boolean checkEqual(String pattern,String substring,int end){
        for (int i=0;i<end;i++)
            if (pattern.charAt(i) != substring.charAt(i))
                return false;
        return true;
    }

    public long reCreateHash(String pattern, long oldHash, int oldCharAsciiValue, int end) {
        long hash = 0;
        hash = oldHash - oldCharAsciiValue;
        hash = hash / prime;
        hash += pattern.charAt(end-1) * Math.pow(prime, end - 1);
        return hash;
    }

    public long createHash(String pattern,int end) {
        long hash = 0L;
        for(int i = 0; i < end; i++)
            hash += pattern.charAt(i) * Math.pow(prime, i);
        return hash;
    }

    public static void main(String arg[]){
        Scanner sc = new Scanner(System.in);
        System.out.println("Enter a String");
        String s = sc.nextLine();
        System.out.println("Enter a pattern");
        String pattern = sc.nextLine();
        RabinKarp rk = new RabinKarp();
        System.out.println("Staring index of pattern is " + rk.patternSearch(s, pattern));
    }
}

RABIN KARP STRING MATCHING ALGORITHM
CODE:

#include <stdio.h>
#include <conio.h>
#include <string.h>
#include <math.h>
#define d 10
void RabinKarpStringMatch(char*, char*, int);
void main()
{
    char *Text, *Pattern;
    int Number = 11; //Prime Number
    clrscr();
    printf("\nEnter Text String : ");
    gets(Text);
    printf("\nEnter Pattern String : ");
    gets(Pattern);

    RabinKarpStringMatch(Text, Pattern, Number);
    getch();
}

void RabinKarpStringMatch(char* Text, char* Pattern, int Number)
{
    int M, N, h, P = 0, T = 0, TempT, TempP;
    int i, j;
    M = strlen(Pattern);
    N = strlen(Text);
    h = (int)pow(d, M - 1) % Number;
    for (i = 0; i < M; i++) {
        P = ((d * P) + ((int)Pattern[i])) % Number;
        TempT = ((d * T) + ((int)Text[i]));
        T = TempT % Number;
    }
    for (i = 0; i <= N - M; i++) {
        if (P == T) {
            for (j = 0; j < M; j++)
                if (Text[i + j] != Pattern[j])
                    break;
            if (j == M)
                printf("\nPattern Found at Position: %d", i + 1);
        }
        TempT = ((d * (T - Text[i] * h)) + ((int)Text[i + M]));
        T = TempT % Number;
        if (T < 0)
            T = T + Number;
    }
}

OUTPUT FOR THE CODE

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top