Question

Reciently I've looked through several implementation of bitap algorithm but what all of them do is finding the beginning point of fuzzy match. What I need is to find a match. There's an example:

Say we have following text: abcdefg

and a pattern: bzde

and we want to find all occurence of a pattern in text with at most 1 error (Edit distance is concidered).

So I need that the algorithm returns: bcde.

Is there a simple (or not simple =) ) way to do it? The original artical about this algorithm doens't answer the question.

Thank you for your help.

Was it helpful?

Solution

For a simple start you could approach it with a series of regular expressions where in every expression you replace 1 character with the . wildcard. Combining those expressions into one using the ( | ) construct to create one big regular expression.

Another way would be to scan the string keeping an errorcount and incrementing the offset you match at when too many errors are encountered.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top