Levenshtein edit distance algorithm that supports Transposition of two adjacent letters in C#

StackOverflow https://stackoverflow.com/questions/10178043

  •  31-05-2021
  •  | 
  •  

Question

i'm searching for an algorithm for computing Levenshtein edit distance that also supports the case in which two adjacent letters are transposed that is implemented in C#.

for example the word "animals" and "ainmals" : switching between the letters "n" and "i" wont be scored as two replacements -which will make a big distance - but instead on will be scored as a transpose of two letters -much more less distance-

what i reached so far in searching

Was it helpful?

Solution

You need to add the additional condition to make it a "Damerau–Levenshtein distance" algorithm. So, using the example here: http://www.dotnetperls.com/levenshtein you just need to add the following condition right after step 6:

 //** Step 7 to make it Damerau–Levenshtein distance
      if (i > 1 && j > 1 && (s[i - 1] == t[j - 2]) && (s[i - 2] == t[j - 1]))
      {
             d[i, j] = Math.Min(
                            d[i, j],
                            d[i - 2, j - 2] + cost   // transposition
                         );
      }

OTHER TIPS

See the implementation on Wikipedia. You can easily adapt the algorithm to include the case for letter swaps. For example:

//bla bla. I'm just copying the code on the Wikipedia.
 d[i, j] := minimum
                   (
                     d[i-1, j] + 1,  // a deletion
                     d[i, j-1] + 1,  // an insertion
                     d[i-1, j-1] + 1, // a substitution
                   )

// This single statement is all you need:
if(s[i-1]==t[j-2] && s[i-2]==t[j-1])
   d[i,j] := minimum
                  (
                      d[i,j],               //cost without swapping 
                      d[i-2,j-2]+something  //cost with swapping. probably something=1 
                  );
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top