Question

I have a bit of trouble implementing the BMH algorithm in C++.

Here's the code:

#define Nm 2000005
int D[256];
char To[Nm],P[Nm],*T;
int Tl,Pl;
int cont;
void initialize_Lenght()
{
    Tl=strlen(To);
    Pl=strlen(P);
    T=To;
}
void compute_D()
{
    for(int i=0;i<256;i++)
        D[i]=Pl;
    for(int i=0;i<Pl;i++)
        D[P[i]]=Pl-i;
}
void Boyer_Moore()
{
    int i;

    while ( Tl>=Pl )    
    {
        for(i=Pl-1;T[i]==P[i]&&i>=0;i--)
            if(i==0)
            {
                if(cont<1000)
                    v[cont]=(T-To); // I also have to store the first 1000 values 
                cont++;
            }
            Tl -= D[T[i+1]];
            T += D[T[i+1]];
    }
}

It works for most examples but there are some examples that do not work (only ones which I found thus far are huge tests downloaded from different sources).

I would want to know where / what I am doing wrong (I do not really want code).

Edit: Due to comments

Do you have any idea how I could make this algorithm run faster without implementing a full Boyer-Moore version of it?

Was it helpful?

Solution

The order of tests in

for(i=Pl-1;T[i]==P[i]&&i>=0;i--)

is wrong. After a complete match, you compare T[-1] to P[-1] before checking whether the index is admissible.

If a mismatch occurs at the last pattern character,

Tl -= D[T[i+1]];
T += D[T[i+1]];

skips according to a character that need not exist (if the pattern end is aligned with the text end).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top