Levenshtein distance combination
-
05-07-2019 - |
Question
LD = Levenshtein Distance
Just doing a few examples on paper, this seems to work, but does anyone know if this is always true?
Lets say I have 3 strings
BOT
BOB
BOM
LD(BOT,BOB) = 1
and
LD(BOB,BOM)=1
then
LD(BOT,BOM)=max(LD(BOT,BOB) , LD(BOB,DOM))=1
OR
BAAB
BBAB
BCCD
LD(BBAB,BAAB) = 1
and
LD(BBAB,BCCD)=3
then
LD(BAAB,BCCD)=max(LD(BBAB,BAAB), LD(BBAB,BCCD))=3
I'd like to know if this always applies.
That is,
LD(B,C) = max (LD(A,C),LD(A,B))
EDIT -- Added at 10/22/2009 7:08PM PST
I'm starting to think this holds for words of the same length, otherwise you can still do it but you have to add the absolute value of the difference in word length.
In essence LD(B,C) = max(LD(A,C),LD(A,B)) + abs(length(B)-length(C))
Solution
Doesn't work.
LD("BOB", "BOT") == 1
LD("BOT", "BOB") == 1
LD("BOB", "BOB") == 0
max(LD("BOB", "BOT"), LD("BOT", "BOB")) == 1
0 != 1
there are probably harder examples also...
OTHER TIPS
No, but this does:
lev(a, c) <= lev(a, b) + lev(b, c) (a.k.a "triangle inequality)
...and is used heavily as a heuristic by VP-Trees and BK-Trees.
Being a metric the levenshtein distance follows the triangle inequality:
http://en.wikipedia.org/wiki/Triangle_inequality
Nothing is better than a test. If you know C# run it through this.
public Int32 CalculateDistance(String x, String y)
{
Int32 xl = x.Length;
Int32 yl = y.Length;
Int32[,] matrix = new Int32[xl + 1, yl + 1];
for (Int32 i = 0; i <= xl; i++)
{
matrix[i, 0] = i;
}
for (Int32 i = 0; i <= yl; i++)
{
matrix[0, i] = i;
}
for (Int32 j = 1; j <= yl; j++)
{
for (Int32 i = 1; i <= xl; i++)
{
if (x[i - 1] == y[j - 1])
{
matrix[i, j] = matrix[i - 1, j - 1];
}
else
{
matrix[i, j] = Min((matrix[i - 1, j] + 1), (matrix[i, j - 1] + 1), (matrix[i - 1, j - 1] + 1));
}
}
}
return matrix[xl, yl];
}
This is a regular Dynamic Programming problem. The Wikipedia entry has a Proof of Correctness part. Are you looking for something else?
Didn't hold true for this case
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace LevenshteinDistance
{
class Program
{
static void Main(string[] args)
{
LevenshteinDistance ld = new LevenshteinDistance();
string a="B";
string b="Book";
string c = "Sick";
Console.WriteLine("{0} = Max( {1}, {2} )", ld.Compute(b, c), ld.Compute(a, c), ld.Compute(a, b));
if (ld.Compute(b, c) == Math.Max(ld.Compute(a, c), ld.Compute(a, b)))
Console.WriteLine("Equal");
else
Console.WriteLine("Not Equal");
Console.ReadKey();
}
}
class LevenshteinDistance
{
//****************************
// Get minimum of three values
//****************************
private int Minimum(int a, int b, int c)
{
int min;
min = a;
if (b < min)
{
min = b;
}
if (c < min)
{
min = c;
}
return min;
}
//*****************************
// Compute Levenshtein distance
//*****************************
public int Compute(string s, string t)
{
int[,] matrix; // matrix
int n; // length of s
int m; // length of t
int i; // iterates through s
int j; // iterates through t
char s_i; // ith character of s
char t_j; // jth character of t
int cost; // cost
// Step 1
n = s.Length;
m = t.Length;
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
matrix = new int[n + 1, m + 1];
// Step 2
for (i = 0; i <= n; i++)
{
matrix[i, 0] = i;
}
for (j = 0; j <= m; j++)
{
matrix[0, j] = j;
}
// Step 3
for (i = 1; i <= n; i++)
{
s_i = s[(i - 1)];
// Step 4
for (j = 1; j <= m; j++)
{
t_j = t[(j - 1)];
// Step 5
if (s_i == t_j)
{
cost = 0;
}
else
{
cost = 1;
}
// Step 6
matrix[i, j] = Minimum(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1, matrix[i - 1, j - 1] + cost);
}
}
// Step 7
return matrix[n, m];
}
}
}