Question

Hi I am writing a PHP class to implement Rabin-Karp algorithm. I have issue with re-hashing part. This code doesn't include matching part of the characters. I had to stop since it never matching hash codes due to the issue with re-hashing. Someone please help me to figure it out.

<?php
 class RabinKarp
 {
    /** 
    * @var String
    */ 
    private $pattern ;
    private $patternHash ;
    private $text ;
    private $previousHash ;

    /** 
    * @var Integer
    */ 
    private $radix ;
    private $prime ;
    private $position ;

    /** 
    * Constructor 
    *
    * @param String $pattern - The pattern
    *
    */ 
    public function __construct($pattern) 
    {
        $this->pattern = $pattern;
        $this->radix = 256;
        $this->prime = 100007;
        $this->previousHash = "";
        $this->position = 0;
        $this->patternHash = $this->generateHash($pattern);
    }

    private function generateHash($key)
    {
        $charArray = str_split($key);
        $hash = 0;
        foreach($charArray as $char)
        {
             $hash = ($this->radix * $hash + ord($char)) % $this->prime;
        }

        return $hash;
    }

    public function search($character)
    {
        $this->text .= $character;
        if(strlen($this->text) < strlen($this->pattern))
        {
            return false;
        }
        else
        {
            $txtHash = 0;
            echo $this->previousHash . "<br/>";
            if(empty($this->previousHash))
            {
                $txtHash = $this->generateHash($this->text);
                $this->previousHash = $txtHash;
                $this->position = 0;
            }
            else
            {
                // The issue is here 
                $charArray = str_split($this->text);
                $txtHash = (($txtHash  + $this->prime) - $this->radix * strlen($this->pattern) * ord($charArray[$this->position]) % $this->prime) % $this->prime;
                $txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime; 

                $this->previousHash = $txtHash;
            }

            if($txtHash == $this->patternHash)
            {
                echo "Hash Match found";
            }
        }

    }

 }

$x = new RabinKarp("ABC");
$x->search("Z");
$x->search("A");
$x->search("B");
$x->search("C");
?>

Thank you.

Was it helpful?

Solution

The value contributed to the hash by the character (c for shortness) you're removing is

ord(c) * radix^(length(pattern)-1)

since a character contributes ord(c) when it first enters the matching window, and the hash - therefore also its contribution - is multiplied with radix for each of the length(pattern)-1 characters entering the matching window until c finally leaves it.

But you're subtracting ord(c) * radix * length(pattern)

$charArray = str_split($this->text);
$txtHash = (($txtHash  + $this->prime)
             - $this->radix * strlen($this->pattern)
             * ord($charArray[$this->position]) % $this->prime)
           % $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;

Additionally, in the calculation you're using the variable $txtHash, which you've set to 0, that should be $this->previousHash, and you must increment the text position.

In principle,

$charArray = str_split($this->text);
$txtHash = (($this->previousHash  + $this->prime)
             - pow($this->radix, strlen($this->pattern)-1)
             * ord($charArray[$this->position]) % $this->prime)
           % $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;

$this->previousHash = $txtHash;
$this->position += 1;

is what you have to do.

But unless the pattern is very short, pow($this->radix,strlen($this->pattern)-1) will overflow, so you have to replace pow($this-radix, strlen($this->pattern)-1) with a modular exponentiation function

function mod_pow($base,$exponent,$modulus)
{
    $aux = 1;
    while($exponent > 0) {
        if ($exponent % 2 == 1) {
            $aux = ($aux * $base) % $modulus;
        }
        $base = ($base * $base) % $modulus;
        $exponent = $exponent/2;
    }
    return $aux;
}

(this can still overflow if $modulus, that is $this->prime here, is too large). The relevant line of code becomes

$txtHash = (($this->previousHash  + $this->prime)
             - mod_pow($this->radix, strlen($this->pattern)-1, $this->prime)
             * ord($charArray[$this->position]) % $this->prime)
           % $this->prime;

Then you have a potentially huge inefficiency

$this->text .= $character;
...
    $charArray = str_split($this->text);

If the string becomes long, the concatenation and the splitting may take a lot of time (not sure how PHP implements strings and string operations, but they're likely not constant time). You should probably keep only the relevant part of the string, i.e. drop the first character after recalculating the hash.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top